musegan

An AI for Music Generation

1,826

371

1,826

View on GitHub

Top Related Projects

magenta

19,130

Magenta: Music and Art Generation with Machine Intelligence

deepjazz

2,871

Deep learning driven jazz generation using Keras & Theano!

awesome-deep-learning-music

2,805

List of articles related to deep learning applied to music

spleeter

25,598

Deezer source separation library including pretrained models.

Quick Overview

MuseGAN is a deep learning project for symbolic multi-track music generation. It uses Generative Adversarial Networks (GANs) to create multi-instrumental music in MIDI format, focusing on creating coherent and harmonious compositions across multiple tracks simultaneously.

Pros

Generates multi-track music with coherent structure and harmony
Produces MIDI output, which is easily editable and compatible with various music software
Offers control over the generation process through conditional inputs
Implements several model variants for different music generation tasks

Cons

Requires significant computational resources for training
Limited to generating music in specific genres (mainly rock/pop)
May produce repetitive or unnatural-sounding patterns in some cases
Requires musical knowledge to fine-tune and interpret results effectively

Code Examples

Loading a pre-trained model:

from musegan.model import MuseGAN
model = MuseGAN()
model.load('path/to/pretrained/model')

Generating a new multi-track composition:

import numpy as np
num_bars = 4
num_tracks = 5
z = np.random.normal(0, 1, (1, num_bars, model.z_dim))
generated_music = model.generate(z)

Saving the generated music as a MIDI file:

from musegan.utils.midi_io import write_midi
write_midi(generated_music[0], 'output.mid', tempo=120)

Getting Started

Clone the repository:

git clone https://github.com/salu133445/musegan.git
cd musegan

Install dependencies:
```
pip install -r requirements.txt
```
Download pre-trained models:
```
python scripts/download_pretrained.py
```

Generate music:

from musegan.model import MuseGAN
from musegan.utils.midi_io import write_midi
import numpy as np

model = MuseGAN()
model.load('pretrained_models/default')
z = np.random.normal(0, 1, (1, 4, model.z_dim))
generated_music = model.generate(z)
write_midi(generated_music[0], 'output.mid', tempo=120)

This will generate a 4-bar multi-track composition and save it as 'output.mid'.

Competitor Comparisons

magenta

19,130

Magenta: Music and Art Generation with Machine Intelligence

Pros of Magenta

Broader scope, covering multiple music generation tasks and models
Larger community and more active development
Better documentation and tutorials for getting started

Cons of Magenta

More complex codebase due to its broader scope
Steeper learning curve for beginners
May require more computational resources for some models

Code Comparison

MuseGAN (main model architecture):

def generator(input_tensor):
    net = tf.layers.dense(input_tensor, 1024, activation=tf.nn.leaky_relu)
    net = tf.layers.dense(net, 4096, activation=tf.nn.leaky_relu)
    net = tf.reshape(net, [-1, 16, 16, 16])
    net = tf.layers.conv2d_transpose(net, 64, 5, strides=2, padding='same')
    net = tf.layers.conv2d_transpose(net, 1, 5, strides=2, padding='same')
    return net

Magenta (MelodyRNN model):

def build_graph(mode, config, sequence_example_file_paths=None):
    model = melody_rnn_model.MelodyRnnModel(config)
    sequence_features = {
        'inputs': tf.FixedLenSequenceFeature([],
                                             dtype=tf.int64,
                                             default_value=0),
        'inputs_lengths': tf.FixedLenSequenceFeature(
            [], dtype=tf.int64, default_value=0)
    }
    return model.build_graph(mode, sequence_features, sequence_example_file_paths)

deepjazz

2,871

Deep learning driven jazz generation using Keras & Theano!

Pros of DeepJazz

Focused specifically on jazz music generation
Simpler architecture, potentially easier to understand and modify
Includes a web interface for demo purposes

Cons of DeepJazz

Less versatile, limited to jazz genre
Older project with fewer recent updates
Smaller scale, potentially less sophisticated output

Code Comparison

DeepJazz:

def generate(self):
    xIni = np.random.randint(0, self.n_vocab, size=(1, self.maxlen))
    for i in range(MAX_EPOCHS):
        preds = self.model.predict(xIni, verbose=0)[0]
        next_index = self.sample(preds, temperature=1.0)
        next_char = self.indices_char[next_index]

MuseGAN:

def generate(self, n_bars, condition=None, temperature=1.0):
    generated = np.zeros((self.batch_size, n_bars, self.beat_resolution, self.n_pitches))
    for i in range(n_bars):
        z = np.random.normal(0, 1, (self.batch_size, self.z_dim))
        cond = condition[:, i] if condition is not None else None
        bar = self.generator.predict([z, cond], verbose=0)
        generated[:, i] = bar

Both projects use similar approaches for music generation, employing neural networks to predict and generate musical sequences. DeepJazz focuses on character-level prediction for jazz, while MuseGAN uses a more complex architecture for multi-track music generation across various genres.

awesome-deep-learning-music

2,805

List of articles related to deep learning applied to music

Pros of awesome-deep-learning-music

Comprehensive collection of resources on deep learning for music
Regularly updated with new papers, projects, and tools
Covers a wide range of topics including generation, analysis, and classification

Cons of awesome-deep-learning-music

No actual implementation or code provided
May be overwhelming for beginners due to the large amount of information

Code comparison

musegan provides actual implementation:

from musegan.model import MuseGAN
model = MuseGAN(config)
model.train(train_data, valid_data)

awesome-deep-learning-music is a curated list, so it doesn't contain code:

## Music Generation
- [MusicVAE](https://github.com/magenta/magenta/tree/master/magenta/models/music_vae)
- [MuseGAN](https://github.com/salu133445/musegan)

Summary

musegan is a specific implementation of a music generation model, while awesome-deep-learning-music is a curated list of resources. musegan provides hands-on experience with a particular approach, while awesome-deep-learning-music offers a broader overview of the field. The choice between them depends on whether you're looking for a specific implementation or a comprehensive resource guide.

spleeter

25,598

Deezer source separation library including pretrained models.

Pros of Spleeter

Focused on audio source separation, particularly useful for isolating vocals and instruments
Well-documented with clear usage instructions and pre-trained models
Actively maintained with regular updates and community support

Cons of Spleeter

Limited to source separation tasks, not designed for music generation
Requires more computational resources for processing large audio files
May introduce artifacts in separated audio, especially with complex mixes

Code Comparison

Spleeter:

from spleeter.separator import Separator

separator = Separator('spleeter:2stems')
separator.separate_to_file('audio_example.mp3', 'output/')

MuseGAN:

from musegan.core import MuseGAN
from musegan.components import NowbarHybrid

musegan = MuseGAN(NowbarHybrid())
musegan.load('pretrained_model.pkl')
samples = musegan.generate(n_bars=8, temperature=1.2)

Key Differences

Spleeter focuses on audio separation, while MuseGAN is designed for music generation
Spleeter has a simpler API for quick audio processing tasks
MuseGAN offers more control over the music generation process with various parameters

Both projects serve different purposes in the music technology domain, with Spleeter being more practical for audio engineering tasks and MuseGAN catering to creative music composition applications.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

MuseGAN

MuseGAN is a project on music generation. In a nutshell, we aim to generate polyphonic music of multiple tracks (instruments). The proposed models are able to generate music either from scratch, or by accompanying a track given a priori by the user.

We train the model with training data collected from Lakh Pianoroll Dataset to generate pop song phrases consisting of bass, drums, guitar, piano and strings tracks.

Sample results are available here.

Important Notes

The latest implementation is based on the network architectures presented in BinaryMuseGAN, where the temporal structure is handled by 3D convolutional layers. The advantage of this design is its smaller network size, while the disadvantage is its reduced controllability, e.g., capability of feeding different latent variables for different measures or tracks.
The original code we used for running the experiments in the paper can be found in the v1 folder.
Looking for a PyTorch version? Check out this repository.

Prerequisites

Below we assume the working directory is the repository root.

Install dependencies

Using pipenv (recommended)

Make sure pipenv is installed. (If not, simply run pip install pipenv.)
```
# Install the dependencies
pipenv install
# Activate the virtual environment
pipenv shell
```

Using pip

# Install the dependencies
pip install -r requirements.txt

Prepare training data

The training data is collected from Lakh Pianoroll Dataset (LPD), a new multitrack pianoroll dataset.

# Download the training data
./scripts/download_data.sh
# Store the training data to shared memory
./scripts/process_data.sh

You can also download the training data manually (train_x_lpd_5_phr.npz).

As pianoroll matrices are generally sparse, we store only the indices of nonzero elements and the array shape into a npz file to save space, and later restore the original array. To save some training data data into this format, simply run np.savez_compressed("data.npz", shape=data.shape, nonzero=data.nonzero())

Scripts

We provide several shell scripts for easy managing the experiments. (See here for a detailed documentation.)

Below we assume the working directory is the repository root.

Train a new model

Run the following command to set up a new experiment with default settings.

# Set up a new experiment
./scripts/setup_exp.sh "./exp/my_experiment/" "Some notes on my experiment"

Modify the configuration and model parameter files for experimental settings.

You can either train the model:

# Train the model
./scripts/run_train.sh "./exp/my_experiment/" "0"

or run the experiment (training + inference + interpolation):

# Run the experiment
./scripts/run_exp.sh "./exp/my_experiment/" "0"

Collect training data

Run the following command to collect training data from MIDI files.

# Collect training data
./scripts/collect_data.sh "./midi_dir/" "data/train.npy"

Use pretrained models

Download pretrained models
```
# Download the pretrained models
./scripts/download_models.sh
```
You can also download the pretrained models manually (pretrained_models.tar.gz).

You can either perform inference from a trained model:

# Run inference from a pretrained model
./scripts/run_inference.sh "./exp/default/" "0"

or perform interpolation from a trained model:

# Run interpolation from a pretrained model
./scripts/run_interpolation.sh "./exp/default/" "0"

Outputs

By default, samples will be generated alongside the training. You can disable this behavior by setting save_samples_steps to zero in the configuration file (config.yaml). The generated will be stored in the following three formats by default.

.npy: raw numpy arrays
.png: image files
.npz: multitrack pianoroll files that can be loaded by the Pypianoroll package

You can disable saving in a specific format by setting save_array_samples, save_image_samples and save_pianoroll_samples to False in the configuration file.

The generated pianorolls are stored in .npz format to save space and processing time. You can use the following code to write them into MIDI files.

from pypianoroll import Multitrack

m = Multitrack('./test.npz')
m.write('./test.mid')

Sample Results

Some sample results can be found in ./exp/ directory. More samples can be downloaded from the following links.

sample_results.tar.gz (54.7 MB): sample inference and interpolation results
training_samples.tar.gz (18.7 MB): sample generated results at different steps

Citing

Please cite the following paper if you use the code provided in this repository.

Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment," AAAI Conference on Artificial Intelligence (AAAI), 2018. (*equal contribution)
[homepage] [arXiv] [paper] [slides] [code]

Papers

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang (*equal contribution)
AAAI Conference on Artificial Intelligence (AAAI), 2018.
[homepage] [arXiv] [paper] [slides] [code]

Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation
Hao-Wen Dong and Yi-Hsuan Yang
International Society for Music Information Retrieval Conference (ISMIR), 2018.
[homepage] [video] [paper] [slides] [slides (long)] [poster] [arXiv] [code]

MuseGAN: Demonstration of a Convolutional GAN Based Model for Generating Multi-track Piano-rolls
Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang (*equal contribution)
ISMIR Late-Breaking Demos, 2017.
[paper] [poster]

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot