audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

20,767

2,117

20,767

300

View on GitHub

Top Related Projects

musiclm-pytorch

3,142

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

bark

35,243

🔊 Text-Prompted Generative Audio Model

AudioGPT

9,965

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

muzic

4,498

Muzic: Music Understanding and Generation with Artificial Intelligence

jukebox

7,798

Code for the paper "Jukebox: A Generative Model for Music"

Quick Overview

AudioCraft is a PyTorch library and collection of models for audio generation developed by Facebook Research. It includes state-of-the-art models for text-to-audio, text-to-music, and audio compression tasks, with a focus on high-quality and controllable audio synthesis.

Pros

Offers cutting-edge models for various audio generation tasks
Provides pre-trained models for easy use and fine-tuning
Supports both CPU and GPU acceleration for efficient processing
Includes comprehensive documentation and examples

Cons

Requires significant computational resources for training and inference
Limited to PyTorch ecosystem, which may not suit all developers
Steep learning curve for those unfamiliar with deep learning audio techniques
Potential ethical concerns regarding AI-generated audio content

Code Examples

Loading and using a pre-trained MusicGen model:

import torchaudio
from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('medium')
descriptions = ['An electronic dance track with a strong beat']
wav = model.generate(descriptions, progress=True)

torchaudio.save('generated_music.wav', wav[0].cpu(), model.sample_rate)

Generating audio from text using AudioGen:

from audiocraft.models import AudioGen

model = AudioGen.get_pretrained('medium')
descriptions = ['A dog barking in the distance']
wav = model.generate(descriptions, progress=True)

torchaudio.save('generated_audio.wav', wav[0].cpu(), model.sample_rate)

Compressing audio using EnCodec:

from audiocraft.models import EncodecModel
import torch

model = EncodecModel.get_pretrained('encodec_24khz')
wav, sr = torchaudio.load('input_audio.wav')
wav = wav.unsqueeze(0)

with torch.no_grad():
    encoded_frames = model.encode(wav)
    decoded_wav = model.decode(encoded_frames)

torchaudio.save('compressed_audio.wav', decoded_wav.squeeze(0).cpu(), sr)

Getting Started

To get started with AudioCraft:

Install the library:

pip install audiocraft

Import and use a model:

from audiocraft.models import MusicGen
model = MusicGen.get_pretrained('small')
wav = model.generate(['A cheerful country song with acoustic guitar'])

Save the generated audio:

import torchaudio
torchaudio.save('output.wav', wav[0].cpu(), model.sample_rate)

Competitor Comparisons

musiclm-pytorch

3,142

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Pros of MusicLM-PyTorch

Simpler implementation, easier to understand and modify
Focuses specifically on music generation
More lightweight and potentially faster to train

Cons of MusicLM-PyTorch

Less comprehensive feature set compared to AudioCraft
May require more manual setup and configuration
Limited to music generation, while AudioCraft covers broader audio tasks

Code Comparison

MusicLM-PyTorch:

from musiclm_pytorch import MusicLM

model = MusicLM(
    dim = 512,
    depth = 6,
    heads = 8,
    dim_head = 64,
    max_seq_len = 1024
)

AudioCraft:

import audiocraft
from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('medium')
model.set_generation_params(duration=8)
wav = model.generate_unconditional(4)

MusicLM-PyTorch provides a more low-level implementation, allowing for greater customization of model architecture. AudioCraft offers a higher-level API with pre-trained models, making it easier to get started with audio generation tasks out of the box.

bark

35,243

🔊 Text-Prompted Generative Audio Model

Pros of Bark

Supports multi-language text-to-speech generation
Offers voice cloning capabilities
Provides more fine-grained control over speech characteristics

Cons of Bark

Limited to speech generation only
Requires more computational resources for inference
Less extensive documentation and community support

Code Comparison

Bark:

from bark import SAMPLE_RATE, generate_audio, preload_models

preload_models()
text = "Hello, world!"
audio_array = generate_audio(text)

AudioCraft:

import torchaudio
from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('small')
wav = model.generate_unconditional(4)
torchaudio.save('audio.wav', wav[0], 32000)

Key Differences

Bark focuses on speech synthesis, while AudioCraft primarily targets music generation
AudioCraft offers more comprehensive audio manipulation capabilities
Bark provides more options for voice customization
AudioCraft has better integration with PyTorch ecosystem
Bark's codebase is more accessible for beginners

Use Cases

Bark: Text-to-speech applications, voice assistants, voice cloning
AudioCraft: Music production, sound design, audio content creation

Community and Support

AudioCraft benefits from Facebook's backing and larger developer community
Bark has a growing community but less extensive resources

AudioGPT

9,965

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Pros of AudioGPT

Offers a more comprehensive suite of audio-related tasks, including speech recognition and audio captioning
Provides a user-friendly interface for interacting with the model
Integrates multiple pre-trained models for various audio tasks

Cons of AudioGPT

Less focused on high-quality music generation compared to Audiocraft
May require more computational resources due to its broader scope
Documentation and examples are less extensive than Audiocraft

Code Comparison

AudioGPT:

from audiopgt import AudioGPT

model = AudioGPT()
result = model.generate_audio("Create a piano melody")
result.save("output.wav")

Audiocraft:

import audiocraft.models as models
from audiocraft.data.audio import audio_write

model = models.MusicGen.get_pretrained()
wav = model.generate(prompt="Create a piano melody", duration=10)
audio_write("output", wav[0].cpu(), model.sample_rate, strategy="loudness")

Both repositories offer powerful audio generation capabilities, but they cater to different use cases. Audiocraft focuses primarily on high-quality music generation, while AudioGPT provides a broader range of audio-related tasks. The code comparison shows that Audiocraft's API is more specialized for music generation, while AudioGPT offers a more general-purpose interface for various audio tasks.

muzic

4,498

Muzic: Music Understanding and Generation with Artificial Intelligence

Pros of Muzic

Broader scope, covering various music AI tasks like generation, understanding, and editing
Includes tools for symbolic music generation and audio-to-MIDI conversion
Offers a wider range of pre-trained models for different music-related tasks

Cons of Muzic

Less focused on high-quality audio generation compared to Audiocraft
May require more setup and configuration due to its broader feature set
Documentation could be less comprehensive for specific audio generation tasks

Code Comparison

Muzic (Symbolic music generation):

from muzic.symbolic.musicbert import MusicBERT

model = MusicBERT.from_pretrained('symbolic/musicbert-base')
generated = model.generate(input_ids, max_length=512)

Audiocraft (Audio generation):

from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('medium')
wav = model.generate(descriptions, duration=8)

Both repositories offer powerful tools for music AI, but Muzic provides a broader range of features for various music-related tasks, while Audiocraft focuses more on high-quality audio generation. The choice between them depends on the specific requirements of your project.

jukebox

7,798

Code for the paper "Jukebox: A Generative Model for Music"

Pros of Jukebox

More established project with a longer history and potentially more stability
Capable of generating complete songs with vocals and lyrics
Offers a wider range of musical styles and genres

Cons of Jukebox

Requires more computational resources and longer generation times
Less user-friendly interface and setup process
Output quality can be inconsistent, especially for vocals

Code Comparison

Jukebox:

vqvae, *priors = MODELS['5b']
raw_to_tokens = vqvae.encode(x.unsqueeze(0)).squeeze(0)
tokens_to_prior = priors[-1].encode(raw_to_tokens.unsqueeze(0)).squeeze(0)

Audiocraft:

model = MusicGen.get_pretrained('melody')
wav = model.generate_unconditional(
    num_samples=1,
    progress=True,
)

Audiocraft offers a more streamlined API for generating music, while Jukebox requires more complex setup and encoding steps. Audiocraft's approach is more user-friendly and easier to integrate into projects, but Jukebox provides more fine-grained control over the generation process.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

AudioCraft

AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.

Installation

AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
python -m pip install 'torch==2.1.0'
# You might need the following before trying to install the packages
python -m pip install setuptools wheel
# Then proceed to one of the following
python -m pip install -U audiocraft  # stable release
python -m pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
python -m pip install -e .  # or if you cloned the repo locally (mandatory if you want to train).
python -m pip install -e '.[wm]'  # if you want to train a watermarking model

We also recommend having ffmpeg installed, either through your system or Anaconda:

sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge

Models

At the moment, AudioCraft contains the training code and inference code for:

MusicGen: A state-of-the-art controllable text-to-music model.
AudioGen: A state-of-the-art text-to-sound model.
EnCodec: A state-of-the-art high fidelity neural audio codec.
Multi Band Diffusion: An EnCodec compatible decoder using diffusion.
MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.
AudioSeal: A state-of-the-art audio watermarking.

Training code

AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models. For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to the AudioCraft training documentation.

For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model that provides pointers to configuration, example grids and model/task-specific information and FAQ.

API documentation

We provide some API documentation for AudioCraft.

FAQ

Is the training code available?

Yes! We provide the training code for EnCodec, MusicGen and Multi Band Diffusion.

Where are the models stored?

Hugging Face stored the model in a specific location, which can be overridden by setting the AUDIOCRAFT_CACHE_DIR environment variable for the AudioCraft models. In order to change the cache location of the other Hugging Face models, please check out the Hugging Face Transformers documentation for the cache setup. Finally, if you use a model that relies on Demucs (e.g. musicgen-melody) and want to change the download location for Demucs, refer to the Torch Hub documentation.

License

The code in this repository is released under the MIT license as found in the LICENSE file.
The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.

Citation

For the general framework of AudioCraft, please cite the following.

@inproceedings{copet2023simple,
    title={Simple and Controllable Music Generation},
    author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre DÃ©fossez},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
}

When referring to a specific model, please cite as mentioned in the model specific README, e.g ./docs/MUSICGEN.md, ./docs/AUDIOGEN.md, etc.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot