Top Related Projects
Magenta: Music and Art Generation with Machine Intelligence
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Code for the paper "Jukebox: A Generative Model for Music"
An AI for Music Generation
Quick Overview
Muzic is a deep learning-based music generation system developed by Microsoft Research. It is designed to generate high-quality music compositions by learning from a large corpus of musical data. The project aims to explore the potential of artificial intelligence in the field of music creation.
Pros
- Innovative Approach: Muzic utilizes advanced deep learning techniques to generate novel and creative musical compositions.
- Diverse Music Generation: The system is capable of generating music in a variety of genres and styles, showcasing its versatility.
- Potential for Artistic Collaboration: Muzic could be used as a tool to assist human composers and musicians in the creative process.
- Open-Source: The project is open-source, allowing for community contributions and further development.
Cons
- Limited Evaluation: The project's evaluation of the generated music's quality and creativity is not extensively documented.
- Computational Complexity: The deep learning models used in Muzic may require significant computational resources, limiting its accessibility.
- Lack of User-Friendly Interface: The project currently lacks a user-friendly interface, making it challenging for non-technical users to interact with.
- Potential Ethical Concerns: The use of AI-generated music in commercial or artistic contexts may raise ethical questions about authorship and intellectual property.
Code Examples
N/A (This is not a code library)
Getting Started
N/A (This is not a code library)
Competitor Comparisons
Magenta: Music and Art Generation with Machine Intelligence
Pros of Magenta
- More mature project with a larger community and longer development history
- Broader scope, covering various aspects of music and art generation
- Extensive documentation and tutorials for easier onboarding
Cons of Magenta
- Less focused on cutting-edge AI music generation techniques
- May have a steeper learning curve for beginners due to its broader scope
- Potentially slower development cycle compared to more specialized projects
Code Comparison
Magenta (Python):
import magenta
melody = magenta.music.Melody(
notes=[60, 62, 64, 65, 67, 69, 71, 72],
start_step=0,
steps_per_quarter=4
)
Muzic (Python):
from muzic import MusicGenerator
generator = MusicGenerator()
melody = generator.generate_melody(
length=8,
scale='C_major'
)
Both repositories offer tools for music generation, but Magenta provides a more comprehensive set of features for various musical tasks, while Muzic focuses specifically on AI-driven music generation. Magenta's code tends to be more low-level, giving users fine-grained control over musical elements. Muzic, on the other hand, aims for a higher-level interface, potentially making it more accessible for quick prototyping and experimentation with AI-generated music.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Pros of Audiocraft
- More comprehensive audio generation capabilities, including music, sound effects, and speech synthesis
- Offers pre-trained models for immediate use, such as MusicGen and AudioGen
- Actively maintained with recent updates and contributions
Cons of Audiocraft
- Focused primarily on audio generation, lacking some music analysis features found in Muzic
- Requires more computational resources due to its advanced models
- Steeper learning curve for users new to audio AI technologies
Code Comparison
Audiocraft example (audio generation):
import torchaudio
from audiocraft.models import MusicGen
model = MusicGen.get_pretrained('medium')
wav = model.generate_unconditional(4, progress=True)
torchaudio.save('generated_music.wav', wav[0].cpu(), model.sample_rate)
Muzic example (music analysis):
from muzic import MusicAnalyzer
analyzer = MusicAnalyzer()
features = analyzer.extract_features('song.mp3')
print(features)
Summary
Audiocraft excels in audio generation tasks, offering powerful pre-trained models for various audio synthesis applications. Muzic, on the other hand, provides a broader range of music-specific analysis tools. While Audiocraft is more suitable for creating new audio content, Muzic is better suited for tasks involving music understanding and analysis.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Pros of NeMo
- Broader scope: Covers speech recognition, natural language processing, and text-to-speech, not just music generation
- More active development: More frequent updates and contributions
- Better documentation and examples for getting started
Cons of NeMo
- Steeper learning curve due to its broader scope
- Requires more computational resources for training and inference
- Less focused on music-specific tasks compared to Muzic
Code Comparison
Muzic example (symbolic music generation):
from muzic.models import MusicTransformer
model = MusicTransformer.from_pretrained('music-transformer-midi')
generated = model.generate(max_length=512, temperature=1.0)
NeMo example (speech recognition):
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("QuartzNet15x5Base-En")
transcription = asr_model.transcribe(["audio_file.wav"])
Both repositories offer high-level APIs for their respective tasks, but NeMo's broader scope is evident in its more diverse set of models and functionalities. Muzic focuses specifically on music-related tasks, while NeMo covers a wider range of audio and speech processing applications.
Code for the paper "Jukebox: A Generative Model for Music"
Pros of Jukebox
- More advanced and capable of generating complete songs with vocals
- Produces higher quality audio output
- Offers pre-trained models for immediate use
Cons of Jukebox
- Requires significant computational resources
- Less focused on music theory and composition
- Limited customization options for users
Code Comparison
Muzic (Python):
from muzic import MusicGenerator
generator = MusicGenerator()
melody = generator.generate_melody(length=16, scale='C_major')
Jukebox (Python):
import jukebox
from jukebox.sample import sample_partial_window
model = jukebox.load_model('1b_lyrics')
sample = sample_partial_window(model, ...)
Key Differences
- Muzic focuses on symbolic music generation and analysis
- Jukebox specializes in raw audio generation, including vocals
- Muzic offers more tools for music theory and composition
- Jukebox provides end-to-end audio generation capabilities
Use Cases
Muzic:
- Music education and theory applications
- Algorithmic composition and MIDI generation
- Music analysis and research
Jukebox:
- AI-generated complete songs with vocals
- Audio synthesis and sound design
- Exploring advanced deep learning for music generation
An AI for Music Generation
Pros of MuseGAN
- Focused specifically on multi-track music generation
- Provides pre-trained models for immediate use
- Includes a comprehensive evaluation metrics suite
Cons of MuseGAN
- Less actively maintained (last update in 2019)
- Limited to MIDI format generation
- Narrower scope compared to Muzic's diverse music AI tasks
Code Comparison
MuseGAN (model definition):
class Generator(nn.Module):
def __init__(self, n_tracks, n_bars, n_steps_per_bar, n_pitches):
super().__init__()
self.z_dim = 32
self.hidden_dim = 512
self.n_tracks = n_tracks
self.n_bars = n_bars
self.n_steps_per_bar = n_steps_per_bar
self.n_pitches = n_pitches
Muzic (model usage):
from muzic.models import MusicTransformer
model = MusicTransformer()
generated_music = model.generate(
prompt="C4 E4 G4",
max_length=512,
temperature=0.9
)
Both repositories focus on AI-driven music generation, but Muzic offers a broader range of tools and more recent updates. MuseGAN provides a specialized approach for multi-track generation, while Muzic encompasses various music AI tasks with a more extensive toolkit.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence. Muzic is pronounced as [ËmjuËzeik]. Besides the logo in image version (see above), Muzic also has a logo in video version (you can click here to watch ). Muzic was started by some researchers from Microsoft Research Asia and also contributed by outside collaborators.
We summarize the scope of our Muzic project in the following figure:
The current work in Muzic includes:
-
Music Understanding
-
Music Generation
- Song Writing
- Lyric-to-Melody and Melody-to-Lyric: SongMASS
- Lyric Generation: DeepRapper
- Lyric-to-Melody Generation: TeleMelody, ReLyMe, Re-creation of Creations (ROC)
- Music Form/Structure Generation
- Music Form Generation: MeloForm
- Long/Short Structure Modeling: Museformer
- Multi-Track Generation
- Text-to-Music Generation: MuseCoco
- Singing Voice Synthesis: HiFiSinger
- Song Writing
-
AI Agent
You can find some music samples generated by our systems on this page: https://ai-muzic.github.io/.
For more speech related research, you can find from this page: https://speechresearch.github.io/ and https://github.com/microsoft/NeuralSpeech.
We are hiring!
We are hiring both research FTEs and research interns on Speech/Audio/Music/Video and LLMs. Please get in touch with Xu Tan (tanxu2012@gmail.com) if you are interested.
What is New?
- CLaMP has won the Best Student Paper Award at ISMIR 2023!
- We release MusicAgent, an AI agent for versatile music processing using large language models.
- We release MuseCoco, a music composition copilot to generate symbolic music from text.
- We release GETMusic, a versatile music copliot with a universal representation and diffusion framework to generate any music tracks.
- We release the first model for cross-modal symbolic MIR: CLaMP.
- We release two new research work on music structure modeling: MeloForm and Museformer.
- We give a tutorial on AI Music Composition at ACM Multimedia 2021.
Requirements
The operating system is Linux. We test on Ubuntu 16.04.6 LTS, CUDA 10, with Python 3.6.12. The requirements for running Muzic are listed in requirements.txt
. To install the requirements, run:
pip install -r requirements.txt
We release the code of several research work: MusicBERT, PDAugment, CLaMP, DeepRapper, SongMASS, TeleMelody, ReLyMe, Re-creation of Creations (ROC), MeloForm, Museformer, GETMusic, MuseCoco, and MusicAgent. You can find the README in the corresponding folder for detailed instructions on how to use.
Reference
If you find the Muzic project useful in your work, you can cite the papers as follows:
- [1] MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, ACL 2021.
- [2] PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription, Chen Zhang, Jiaxing Yu, Luchin Chang, Xu Tan, Jiawei Chen, Tao Qin, Kejun Zhang, ISMIR 2022.
- [3] DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling, Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu, ACL 2021.
- [4] SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint, Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin, AAAI 2021.
- [5] TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method, Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu, EMNLP 2022.
- [6] ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships, Chen Zhang, LuChin Chang, Songruoyao Wu, Xu Tan, Tao Qin, Tie-Yan Liu, Kejun Zhang, ACM Multimedia 2022.
- [7] Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation, Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan, arXiv 2022.
- [8] MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks, Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu, ISMIR 2022.
- [9] Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation, Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu, NeurIPS 2022.
- [10] PopMAG: Pop Music Accompaniment Generation, Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu, ACM Multimedia 2020.
- [11] HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis, Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu, arXiv 2020.
- [12] CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval, Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun, ISMIR 2023, Best Student Paper Award.
- [13] GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework, Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan, arXiv 2023.
- [14] MuseCoco: Generating Symbolic Music from Text, Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian, arXiv 2023.
- [15] MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models, Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian, EMNLP 2023 Demo.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Top Related Projects
Magenta: Music and Art Generation with Machine Intelligence
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Code for the paper "Jukebox: A Generative Model for Music"
An AI for Music Generation
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot