In Youka, you can use the “Split model” feature to separate the vocals from the instrumentals in a song. You have two options:
- Demucs Demucs (Deep Extractor for Music Sources) is a state-of-the-art deep learning model for music source separation. It operates in the time-domain, preserving the temporal details of audio, and uses a convolutional neural network (CNN) architecture. Demucs is known for its high-quality separation of vocals, drums, bass, and other instruments, making it a top choice for tasks like karaoke creation and music production. It has evolved through multiple versions, each improving on the last, and is recognized for its strong performance in Signal-to-Distortion Ratio (SDR) benchmarks (GitHub) (QuadraphonicQuad).
- MDX-23C The MDX-23C model is designed for advanced music demixing tasks, specifically targeting the separation of music into four stems: bass, drums, vocals, and other instruments. This model is based on a blend of Demucs4 and MDX neural net architectures and incorporates certain weights from the Ultimate Vocal Remover project. MDX-23C offers high-quality separation and is particularly effective when used with a powerful GPU setup, making it a strong choice for users looking for precise and professional-grade audio separation (GitHub) (QuadraphonicQuad).
- ReFormer ReFormer is a relatively new entrant in the field of music demixing, known for its innovative approach to separating stems in music tracks. While detailed information about ReFormer is less widespread, it is reputed to combine traditional signal processing techniques with modern deep learning methods to achieve clean and accurate separation. This model aims to balance quality and processing speed, making it suitable for both professional and hobbyist users.
- MDX-Net (with backing vocals) MDX-Net is a two-stream neural network specifically developed for music demixing, featuring both a time-frequency branch and a time-domain branch. This architecture allows the model to separate stems by analyzing different aspects of the audio, combining the outputs from both streams to generate highly accurate separations. MDX-Net has proven its effectiveness by securing top positions in international music demixing challenges, making it a reliable option for users who need high precision in their audio processing (GitHub).