Top Related Projects
Magenta: Music and Art Generation with Machine Intelligence
Deep learning driven jazz generation using Keras & Theano!
List of articles related to deep learning applied to music
Deezer source separation library including pretrained models.
Quick Overview
MuseGAN is a deep learning project for symbolic multi-track music generation. It uses Generative Adversarial Networks (GANs) to create multi-instrumental music in MIDI format, focusing on creating coherent and harmonious compositions across multiple tracks simultaneously.
Pros
- Generates multi-track music with coherent structure and harmony
- Produces MIDI output, which is easily editable and compatible with various music software
- Offers control over the generation process through conditional inputs
- Implements several model variants for different music generation tasks
Cons
- Requires significant computational resources for training
- Limited to generating music in specific genres (mainly rock/pop)
- May produce repetitive or unnatural-sounding patterns in some cases
- Requires musical knowledge to fine-tune and interpret results effectively
Code Examples
- Loading a pre-trained model:
from musegan.model import MuseGAN
model = MuseGAN()
model.load('path/to/pretrained/model')
- Generating a new multi-track composition:
import numpy as np
num_bars = 4
num_tracks = 5
z = np.random.normal(0, 1, (1, num_bars, model.z_dim))
generated_music = model.generate(z)
- Saving the generated music as a MIDI file:
from musegan.utils.midi_io import write_midi
write_midi(generated_music[0], 'output.mid', tempo=120)
Getting Started
-
Clone the repository:
git clone https://github.com/salu133445/musegan.git cd musegan
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models:
python scripts/download_pretrained.py
-
Generate music:
from musegan.model import MuseGAN from musegan.utils.midi_io import write_midi import numpy as np model = MuseGAN() model.load('pretrained_models/default') z = np.random.normal(0, 1, (1, 4, model.z_dim)) generated_music = model.generate(z) write_midi(generated_music[0], 'output.mid', tempo=120)
This will generate a 4-bar multi-track composition and save it as 'output.mid'.
Competitor Comparisons
Magenta: Music and Art Generation with Machine Intelligence
Pros of Magenta
- Broader scope, covering multiple music generation tasks and models
- Larger community and more active development
- Better documentation and tutorials for getting started
Cons of Magenta
- More complex codebase due to its broader scope
- Steeper learning curve for beginners
- May require more computational resources for some models
Code Comparison
MuseGAN (main model architecture):
def generator(input_tensor):
net = tf.layers.dense(input_tensor, 1024, activation=tf.nn.leaky_relu)
net = tf.layers.dense(net, 4096, activation=tf.nn.leaky_relu)
net = tf.reshape(net, [-1, 16, 16, 16])
net = tf.layers.conv2d_transpose(net, 64, 5, strides=2, padding='same')
net = tf.layers.conv2d_transpose(net, 1, 5, strides=2, padding='same')
return net
Magenta (MelodyRNN model):
def build_graph(mode, config, sequence_example_file_paths=None):
model = melody_rnn_model.MelodyRnnModel(config)
sequence_features = {
'inputs': tf.FixedLenSequenceFeature([],
dtype=tf.int64,
default_value=0),
'inputs_lengths': tf.FixedLenSequenceFeature(
[], dtype=tf.int64, default_value=0)
}
return model.build_graph(mode, sequence_features, sequence_example_file_paths)
Deep learning driven jazz generation using Keras & Theano!
Pros of DeepJazz
- Focused specifically on jazz music generation
- Simpler architecture, potentially easier to understand and modify
- Includes a web interface for demo purposes
Cons of DeepJazz
- Less versatile, limited to jazz genre
- Older project with fewer recent updates
- Smaller scale, potentially less sophisticated output
Code Comparison
DeepJazz:
def generate(self):
xIni = np.random.randint(0, self.n_vocab, size=(1, self.maxlen))
for i in range(MAX_EPOCHS):
preds = self.model.predict(xIni, verbose=0)[0]
next_index = self.sample(preds, temperature=1.0)
next_char = self.indices_char[next_index]
MuseGAN:
def generate(self, n_bars, condition=None, temperature=1.0):
generated = np.zeros((self.batch_size, n_bars, self.beat_resolution, self.n_pitches))
for i in range(n_bars):
z = np.random.normal(0, 1, (self.batch_size, self.z_dim))
cond = condition[:, i] if condition is not None else None
bar = self.generator.predict([z, cond], verbose=0)
generated[:, i] = bar
Both projects use similar approaches for music generation, employing neural networks to predict and generate musical sequences. DeepJazz focuses on character-level prediction for jazz, while MuseGAN uses a more complex architecture for multi-track music generation across various genres.
List of articles related to deep learning applied to music
Pros of awesome-deep-learning-music
- Comprehensive collection of resources on deep learning for music
- Regularly updated with new papers, projects, and tools
- Covers a wide range of topics including generation, analysis, and classification
Cons of awesome-deep-learning-music
- No actual implementation or code provided
- May be overwhelming for beginners due to the large amount of information
Code comparison
musegan provides actual implementation:
from musegan.model import MuseGAN
model = MuseGAN(config)
model.train(train_data, valid_data)
awesome-deep-learning-music is a curated list, so it doesn't contain code:
## Music Generation
- [MusicVAE](https://github.com/magenta/magenta/tree/master/magenta/models/music_vae)
- [MuseGAN](https://github.com/salu133445/musegan)
Summary
musegan is a specific implementation of a music generation model, while awesome-deep-learning-music is a curated list of resources. musegan provides hands-on experience with a particular approach, while awesome-deep-learning-music offers a broader overview of the field. The choice between them depends on whether you're looking for a specific implementation or a comprehensive resource guide.
Deezer source separation library including pretrained models.
Pros of Spleeter
- Focused on audio source separation, particularly useful for isolating vocals and instruments
- Well-documented with clear usage instructions and pre-trained models
- Actively maintained with regular updates and community support
Cons of Spleeter
- Limited to source separation tasks, not designed for music generation
- Requires more computational resources for processing large audio files
- May introduce artifacts in separated audio, especially with complex mixes
Code Comparison
Spleeter:
from spleeter.separator import Separator
separator = Separator('spleeter:2stems')
separator.separate_to_file('audio_example.mp3', 'output/')
MuseGAN:
from musegan.core import MuseGAN
from musegan.components import NowbarHybrid
musegan = MuseGAN(NowbarHybrid())
musegan.load('pretrained_model.pkl')
samples = musegan.generate(n_bars=8, temperature=1.2)
Key Differences
- Spleeter focuses on audio separation, while MuseGAN is designed for music generation
- Spleeter has a simpler API for quick audio processing tasks
- MuseGAN offers more control over the music generation process with various parameters
Both projects serve different purposes in the music technology domain, with Spleeter being more practical for audio engineering tasks and MuseGAN catering to creative music composition applications.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
MuseGAN
MuseGAN is a project on music generation. In a nutshell, we aim to generate polyphonic music of multiple tracks (instruments). The proposed models are able to generate music either from scratch, or by accompanying a track given a priori by the user.
We train the model with training data collected from Lakh Pianoroll Dataset to generate pop song phrases consisting of bass, drums, guitar, piano and strings tracks.
Sample results are available here.
Important Notes
- The latest implementation is based on the network architectures presented in BinaryMuseGAN, where the temporal structure is handled by 3D convolutional layers. The advantage of this design is its smaller network size, while the disadvantage is its reduced controllability, e.g., capability of feeding different latent variables for different measures or tracks.
- The original code we used for running the experiments in the paper can be found in the
v1
folder. - Looking for a PyTorch version? Check out this repository.
Prerequisites
Below we assume the working directory is the repository root.
Install dependencies
-
Using pipenv (recommended)
Make sure
pipenv
is installed. (If not, simply runpip install pipenv
.)# Install the dependencies pipenv install # Activate the virtual environment pipenv shell
-
Using pip
# Install the dependencies pip install -r requirements.txt
Prepare training data
The training data is collected from Lakh Pianoroll Dataset (LPD), a new multitrack pianoroll dataset.
# Download the training data
./scripts/download_data.sh
# Store the training data to shared memory
./scripts/process_data.sh
You can also download the training data manually (train_x_lpd_5_phr.npz).
As pianoroll matrices are generally sparse, we store only the indices of nonzero elements and the array shape into a npz file to save space, and later restore the original array. To save some training data
data
into this format, simply runnp.savez_compressed("data.npz", shape=data.shape, nonzero=data.nonzero())
Scripts
We provide several shell scripts for easy managing the experiments. (See here for a detailed documentation.)
Below we assume the working directory is the repository root.
Train a new model
-
Run the following command to set up a new experiment with default settings.
# Set up a new experiment ./scripts/setup_exp.sh "./exp/my_experiment/" "Some notes on my experiment"
-
Modify the configuration and model parameter files for experimental settings.
-
You can either train the model:
# Train the model ./scripts/run_train.sh "./exp/my_experiment/" "0"
or run the experiment (training + inference + interpolation):
# Run the experiment ./scripts/run_exp.sh "./exp/my_experiment/" "0"
Collect training data
Run the following command to collect training data from MIDI files.
# Collect training data
./scripts/collect_data.sh "./midi_dir/" "data/train.npy"
Use pretrained models
-
Download pretrained models
# Download the pretrained models ./scripts/download_models.sh
You can also download the pretrained models manually (pretrained_models.tar.gz).
-
You can either perform inference from a trained model:
# Run inference from a pretrained model ./scripts/run_inference.sh "./exp/default/" "0"
or perform interpolation from a trained model:
# Run interpolation from a pretrained model ./scripts/run_interpolation.sh "./exp/default/" "0"
Outputs
By default, samples will be generated alongside the training. You can disable
this behavior by setting save_samples_steps
to zero in the configuration file
(config.yaml
). The generated will be stored in the following three formats by
default.
.npy
: raw numpy arrays.png
: image files.npz
: multitrack pianoroll files that can be loaded by the Pypianoroll package
You can disable saving in a specific format by setting save_array_samples
,
save_image_samples
and save_pianoroll_samples
to False
in the
configuration file.
The generated pianorolls are stored in .npz format to save space and processing time. You can use the following code to write them into MIDI files.
from pypianoroll import Multitrack
m = Multitrack('./test.npz')
m.write('./test.mid')
Sample Results
Some sample results can be found in ./exp/
directory. More samples can be
downloaded from the following links.
sample_results.tar.gz
(54.7 MB): sample inference and interpolation resultstraining_samples.tar.gz
(18.7 MB): sample generated results at different steps
Citing
Please cite the following paper if you use the code provided in this repository.
Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic
Music Generation and Accompaniment," AAAI Conference on Artificial Intelligence (AAAI), 2018. (*equal contribution)
[homepage]
[arXiv]
[paper]
[slides]
[code]
Papers
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang (*equal contribution)
AAAI Conference on Artificial Intelligence (AAAI), 2018.
[homepage]
[arXiv]
[paper]
[slides]
[code]
Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation
Hao-Wen Dong and Yi-Hsuan Yang
International Society for Music Information Retrieval Conference (ISMIR), 2018.
[homepage]
[video]
[paper]
[slides]
[slides (long)]
[poster]
[arXiv]
[code]
MuseGAN: Demonstration of a Convolutional GAN Based Model for Generating Multi-track Piano-rolls
Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang and Yi-Hsuan Yang (*equal contribution)
ISMIR Late-Breaking Demos, 2017.
[paper]
[poster]
Top Related Projects
Magenta: Music and Art Generation with Machine Intelligence
Deep learning driven jazz generation using Keras & Theano!
List of articles related to deep learning applied to music
Deezer source separation library including pretrained models.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot