muzic

Muzic: Music Understanding and Generation with Artificial Intelligence

4,815

479

4,815

113

View on GitHub

Top Related Projects

magenta

19,599

Magenta: Music and Art Generation with Machine Intelligence

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

NeMo

15,292

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

jukebox

8,014

Code for the paper "Jukebox: A Generative Model for Music"

musegan

1,963

An AI for Music Generation

Quick Overview

Muzic is a deep learning-based music generation system developed by Microsoft Research. It is designed to generate high-quality music compositions by learning from a large corpus of musical data. The project aims to explore the potential of artificial intelligence in the field of music creation.

Pros

Innovative Approach: Muzic utilizes advanced deep learning techniques to generate novel and creative musical compositions.
Diverse Music Generation: The system is capable of generating music in a variety of genres and styles, showcasing its versatility.
Potential for Artistic Collaboration: Muzic could be used as a tool to assist human composers and musicians in the creative process.
Open-Source: The project is open-source, allowing for community contributions and further development.

Cons

Limited Evaluation: The project's evaluation of the generated music's quality and creativity is not extensively documented.
Computational Complexity: The deep learning models used in Muzic may require significant computational resources, limiting its accessibility.
Lack of User-Friendly Interface: The project currently lacks a user-friendly interface, making it challenging for non-technical users to interact with.
Potential Ethical Concerns: The use of AI-generated music in commercial or artistic contexts may raise ethical questions about authorship and intellectual property.

Code Examples

N/A (This is not a code library)

Getting Started

N/A (This is not a code library)

Competitor Comparisons

magenta

19,599

Magenta: Music and Art Generation with Machine Intelligence

Pros of Magenta

More mature project with a larger community and longer development history
Broader scope, covering various aspects of music and art generation
Extensive documentation and tutorials for easier onboarding

Cons of Magenta

Less focused on cutting-edge AI music generation techniques
May have a steeper learning curve for beginners due to its broader scope
Potentially slower development cycle compared to more specialized projects

Code Comparison

Magenta (Python):

import magenta

melody = magenta.music.Melody(
    notes=[60, 62, 64, 65, 67, 69, 71, 72],
    start_step=0,
    steps_per_quarter=4
)

Muzic (Python):

from muzic import MusicGenerator

generator = MusicGenerator()
melody = generator.generate_melody(
    length=8,
    scale='C_major'
)

Both repositories offer tools for music generation, but Magenta provides a more comprehensive set of features for various musical tasks, while Muzic focuses specifically on AI-driven music generation. Magenta's code tends to be more low-level, giving users fine-grained control over musical elements. Muzic, on the other hand, aims for a higher-level interface, potentially making it more accessible for quick prototyping and experimentation with AI-generated music.

audiocraft

22,358

Pros of Audiocraft

More comprehensive audio generation capabilities, including music, sound effects, and speech synthesis
Offers pre-trained models for immediate use, such as MusicGen and AudioGen
Actively maintained with recent updates and contributions

Cons of Audiocraft

Focused primarily on audio generation, lacking some music analysis features found in Muzic
Requires more computational resources due to its advanced models
Steeper learning curve for users new to audio AI technologies

Code Comparison

Audiocraft example (audio generation):

import torchaudio
from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('medium')
wav = model.generate_unconditional(4, progress=True)
torchaudio.save('generated_music.wav', wav[0].cpu(), model.sample_rate)

Muzic example (music analysis):

from muzic import MusicAnalyzer

analyzer = MusicAnalyzer()
features = analyzer.extract_features('song.mp3')
print(features)

Summary

Audiocraft excels in audio generation tasks, offering powerful pre-trained models for various audio synthesis applications. Muzic, on the other hand, provides a broader range of music-specific analysis tools. While Audiocraft is more suitable for creating new audio content, Muzic is better suited for tasks involving music understanding and analysis.

NeMo

15,292

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Pros of NeMo

Broader scope: Covers speech recognition, natural language processing, and text-to-speech, not just music generation
More active development: More frequent updates and contributions
Better documentation and examples for getting started

Cons of NeMo

Steeper learning curve due to its broader scope
Requires more computational resources for training and inference
Less focused on music-specific tasks compared to Muzic

Code Comparison

Muzic example (symbolic music generation):

from muzic.models import MusicTransformer

model = MusicTransformer.from_pretrained('music-transformer-midi')
generated = model.generate(max_length=512, temperature=1.0)

NeMo example (speech recognition):

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("QuartzNet15x5Base-En")
transcription = asr_model.transcribe(["audio_file.wav"])

Both repositories offer high-level APIs for their respective tasks, but NeMo's broader scope is evident in its more diverse set of models and functionalities. Muzic focuses specifically on music-related tasks, while NeMo covers a wider range of audio and speech processing applications.

jukebox

8,014

Code for the paper "Jukebox: A Generative Model for Music"

Pros of Jukebox

More advanced and capable of generating complete songs with vocals
Produces higher quality audio output
Offers pre-trained models for immediate use

Cons of Jukebox

Requires significant computational resources
Less focused on music theory and composition
Limited customization options for users

Code Comparison

Muzic (Python):

from muzic import MusicGenerator

generator = MusicGenerator()
melody = generator.generate_melody(length=16, scale='C_major')

Jukebox (Python):

import jukebox
from jukebox.sample import sample_partial_window

model = jukebox.load_model('1b_lyrics')
sample = sample_partial_window(model, ...)

Key Differences

Muzic focuses on symbolic music generation and analysis
Jukebox specializes in raw audio generation, including vocals
Muzic offers more tools for music theory and composition
Jukebox provides end-to-end audio generation capabilities

Use Cases

Muzic:

Music education and theory applications
Algorithmic composition and MIDI generation
Music analysis and research

Jukebox:

AI-generated complete songs with vocals
Audio synthesis and sound design
Exploring advanced deep learning for music generation

musegan

1,963

An AI for Music Generation

Pros of MuseGAN

Focused specifically on multi-track music generation
Provides pre-trained models for immediate use
Includes a comprehensive evaluation metrics suite

Cons of MuseGAN

Less actively maintained (last update in 2019)
Limited to MIDI format generation
Narrower scope compared to Muzic's diverse music AI tasks

Code Comparison

MuseGAN (model definition):

class Generator(nn.Module):
    def __init__(self, n_tracks, n_bars, n_steps_per_bar, n_pitches):
        super().__init__()
        self.z_dim = 32
        self.hidden_dim = 512
        self.n_tracks = n_tracks
        self.n_bars = n_bars
        self.n_steps_per_bar = n_steps_per_bar
        self.n_pitches = n_pitches

Muzic (model usage):

from muzic.models import MusicTransformer

model = MusicTransformer()
generated_music = model.generate(
    prompt="C4 E4 G4",
    max_length=512,
    temperature=0.9
)

Both repositories focus on AI-driven music generation, but Muzic offers a broader range of tools and more recent updates. MuseGAN provides a specialized approach for multi-track generation, while Muzic encompasses various music AI tasks with a more extensive toolkit.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence. Muzic is pronounced as [ËmjuËzeik]. Besides the logo in image version (see above), Muzic also has a logo in video version (you can click here to watch ). Muzic was started by some researchers from Microsoft Research Asia and also contributed by outside collaborators.

We summarize the scope of our Muzic project in the following figure:

The current work in Muzic includes:

Music Understanding
- Symbolic Music Understanding: MusicBERT
- Automatic Lyrics Transcription: PDAugment
- Contrastive Language-Music Pre-training: CLaMP
Music Generation
- Song Writing
  - Lyric-to-Melody and Melody-to-Lyric: SongMASS
  - Lyric Generation: DeepRapper
  - Lyric-to-Melody Generation: TeleMelody, ReLyMe, Re-creation of Creations (ROC)
- Music Form/Structure Generation
  - Music Form Generation: MeloForm
  - Long/Short Structure Modeling: Museformer
- Multi-Track Generation
  - Accompaniment Generation: PopMAG
  - Any Track Music Generation: GETMusic
- Text-to-Music Generation: MuseCoco
- Singing Voice Synthesis: HiFiSinger
AI Agent
- MusicAgent
You can find some music samples generated by our systems on this page: https://ai-muzic.github.io/.

For more speech related research, you can find from this page: https://speechresearch.github.io/ and https://github.com/microsoft/NeuralSpeech.

We are hiring!

We are hiring both research FTEs and research interns on Speech/Audio/Music/Video and LLMs. Please get in touch with Xu Tan (tanxu2012@gmail.com) if you are interested.

What is New?

CLaMP has won the Best Student Paper Award at ISMIR 2023!
We release MusicAgent, an AI agent for versatile music processing using large language models.
We release MuseCoco, a music composition copilot to generate symbolic music from text.
We release GETMusic, a versatile music copliot with a universal representation and diffusion framework to generate any music tracks.
We release the first model for cross-modal symbolic MIR: CLaMP.
We release two new research work on music structure modeling: MeloForm and Museformer.
We give a tutorial on AI Music Composition at ACM Multimedia 2021.

Requirements

The operating system is Linux. We test on Ubuntu 16.04.6 LTS, CUDA 10, with Python 3.6.12. The requirements for running Muzic are listed in requirements.txt. To install the requirements, run:

pip install -r requirements.txt

We release the code of several research work: MusicBERT, PDAugment, CLaMP, DeepRapper, SongMASS, TeleMelody, ReLyMe, Re-creation of Creations (ROC), MeloForm, Museformer, GETMusic, MuseCoco, and MusicAgent. You can find the README in the corresponding folder for detailed instructions on how to use.

Reference

If you find the Muzic project useful in your work, you can cite the papers as follows:

[1] MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, ACL 2021.
[2] PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription, Chen Zhang, Jiaxing Yu, Luchin Chang, Xu Tan, Jiawei Chen, Tao Qin, Kejun Zhang, ISMIR 2022.
[3] DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling, Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu, ACL 2021.
[4] SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint, Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin, AAAI 2021.
[5] TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method, Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu, EMNLP 2022.
[6] ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships, Chen Zhang, LuChin Chang, Songruoyao Wu, Xu Tan, Tao Qin, Tie-Yan Liu, Kejun Zhang, ACM Multimedia 2022.
[7] Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation, Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan, arXiv 2022.
[8] MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks, Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu, ISMIR 2022.
[9] Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation, Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu, NeurIPS 2022.
[10] PopMAG: Pop Music Accompaniment Generation, Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu, ACM Multimedia 2020.
[11] HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis, Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu, arXiv 2020.
[12] CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval, Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun, ISMIR 2023, Best Student Paper Award.
[13] GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework, Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan, arXiv 2023.
[14] MuseCoco: Generating Symbolic Music from Text, Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian, arXiv 2023.
[15] MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models, Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian, EMNLP 2023 Demo.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot