Convert Figma logo to code with AI

facebookresearch logoaudiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

20,573
2,083
20,573
290

Top Related Projects

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

35,243

🔊 Text-Prompted Generative Audio Model

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

4,452

Muzic: Music Understanding and Generation with Artificial Intelligence

7,760

Code for the paper "Jukebox: A Generative Model for Music"

Quick Overview

AudioCraft is a PyTorch library and collection of models for audio generation developed by Facebook Research. It includes state-of-the-art models for text-to-audio, text-to-music, and audio compression tasks, with a focus on high-quality and controllable audio synthesis.

Pros

  • Offers cutting-edge models for various audio generation tasks
  • Provides pre-trained models for easy use and fine-tuning
  • Supports both CPU and GPU acceleration for efficient processing
  • Includes comprehensive documentation and examples

Cons

  • Requires significant computational resources for training and inference
  • Limited to PyTorch ecosystem, which may not suit all developers
  • Steep learning curve for those unfamiliar with deep learning audio techniques
  • Potential ethical concerns regarding AI-generated audio content

Code Examples

  1. Loading and using a pre-trained MusicGen model:
import torchaudio
from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('medium')
descriptions = ['An electronic dance track with a strong beat']
wav = model.generate(descriptions, progress=True)

torchaudio.save('generated_music.wav', wav[0].cpu(), model.sample_rate)
  1. Generating audio from text using AudioGen:
from audiocraft.models import AudioGen

model = AudioGen.get_pretrained('medium')
descriptions = ['A dog barking in the distance']
wav = model.generate(descriptions, progress=True)

torchaudio.save('generated_audio.wav', wav[0].cpu(), model.sample_rate)
  1. Compressing audio using EnCodec:
from audiocraft.models import EncodecModel
import torch

model = EncodecModel.get_pretrained('encodec_24khz')
wav, sr = torchaudio.load('input_audio.wav')
wav = wav.unsqueeze(0)

with torch.no_grad():
    encoded_frames = model.encode(wav)
    decoded_wav = model.decode(encoded_frames)

torchaudio.save('compressed_audio.wav', decoded_wav.squeeze(0).cpu(), sr)

Getting Started

To get started with AudioCraft:

  1. Install the library:
pip install audiocraft
  1. Import and use a model:
from audiocraft.models import MusicGen
model = MusicGen.get_pretrained('small')
wav = model.generate(['A cheerful country song with acoustic guitar'])
  1. Save the generated audio:
import torchaudio
torchaudio.save('output.wav', wav[0].cpu(), model.sample_rate)

Competitor Comparisons

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Pros of MusicLM-PyTorch

  • Simpler implementation, easier to understand and modify
  • Focuses specifically on music generation
  • More lightweight and potentially faster to train

Cons of MusicLM-PyTorch

  • Less comprehensive feature set compared to AudioCraft
  • May require more manual setup and configuration
  • Limited to music generation, while AudioCraft covers broader audio tasks

Code Comparison

MusicLM-PyTorch:

from musiclm_pytorch import MusicLM

model = MusicLM(
    dim = 512,
    depth = 6,
    heads = 8,
    dim_head = 64,
    max_seq_len = 1024
)

AudioCraft:

import audiocraft
from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('medium')
model.set_generation_params(duration=8)
wav = model.generate_unconditional(4)

MusicLM-PyTorch provides a more low-level implementation, allowing for greater customization of model architecture. AudioCraft offers a higher-level API with pre-trained models, making it easier to get started with audio generation tasks out of the box.

35,243

🔊 Text-Prompted Generative Audio Model

Pros of Bark

  • Supports multi-language text-to-speech generation
  • Offers voice cloning capabilities
  • Provides more fine-grained control over speech characteristics

Cons of Bark

  • Limited to speech generation only
  • Requires more computational resources for inference
  • Less extensive documentation and community support

Code Comparison

Bark:

from bark import SAMPLE_RATE, generate_audio, preload_models

preload_models()
text = "Hello, world!"
audio_array = generate_audio(text)

AudioCraft:

import torchaudio
from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('small')
wav = model.generate_unconditional(4)
torchaudio.save('audio.wav', wav[0], 32000)

Key Differences

  • Bark focuses on speech synthesis, while AudioCraft primarily targets music generation
  • AudioCraft offers more comprehensive audio manipulation capabilities
  • Bark provides more options for voice customization
  • AudioCraft has better integration with PyTorch ecosystem
  • Bark's codebase is more accessible for beginners

Use Cases

  • Bark: Text-to-speech applications, voice assistants, voice cloning
  • AudioCraft: Music production, sound design, audio content creation

Community and Support

  • AudioCraft benefits from Facebook's backing and larger developer community
  • Bark has a growing community but less extensive resources

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Pros of AudioGPT

  • Offers a more comprehensive suite of audio-related tasks, including speech recognition and audio captioning
  • Provides a user-friendly interface for interacting with the model
  • Integrates multiple pre-trained models for various audio tasks

Cons of AudioGPT

  • Less focused on high-quality music generation compared to Audiocraft
  • May require more computational resources due to its broader scope
  • Documentation and examples are less extensive than Audiocraft

Code Comparison

AudioGPT:

from audiopgt import AudioGPT

model = AudioGPT()
result = model.generate_audio("Create a piano melody")
result.save("output.wav")

Audiocraft:

import audiocraft.models as models
from audiocraft.data.audio import audio_write

model = models.MusicGen.get_pretrained()
wav = model.generate(prompt="Create a piano melody", duration=10)
audio_write("output", wav[0].cpu(), model.sample_rate, strategy="loudness")

Both repositories offer powerful audio generation capabilities, but they cater to different use cases. Audiocraft focuses primarily on high-quality music generation, while AudioGPT provides a broader range of audio-related tasks. The code comparison shows that Audiocraft's API is more specialized for music generation, while AudioGPT offers a more general-purpose interface for various audio tasks.

4,452

Muzic: Music Understanding and Generation with Artificial Intelligence

Pros of Muzic

  • Broader scope, covering various music AI tasks like generation, understanding, and editing
  • Includes tools for symbolic music generation and audio-to-MIDI conversion
  • Offers a wider range of pre-trained models for different music-related tasks

Cons of Muzic

  • Less focused on high-quality audio generation compared to Audiocraft
  • May require more setup and configuration due to its broader feature set
  • Documentation could be less comprehensive for specific audio generation tasks

Code Comparison

Muzic (Symbolic music generation):

from muzic.symbolic.musicbert import MusicBERT

model = MusicBERT.from_pretrained('symbolic/musicbert-base')
generated = model.generate(input_ids, max_length=512)

Audiocraft (Audio generation):

from audiocraft.models import MusicGen

model = MusicGen.get_pretrained('medium')
wav = model.generate(descriptions, duration=8)

Both repositories offer powerful tools for music AI, but Muzic provides a broader range of features for various music-related tasks, while Audiocraft focuses more on high-quality audio generation. The choice between them depends on the specific requirements of your project.

7,760

Code for the paper "Jukebox: A Generative Model for Music"

Pros of Jukebox

  • More established project with a longer history and potentially more stability
  • Capable of generating complete songs with vocals and lyrics
  • Offers a wider range of musical styles and genres

Cons of Jukebox

  • Requires more computational resources and longer generation times
  • Less user-friendly interface and setup process
  • Output quality can be inconsistent, especially for vocals

Code Comparison

Jukebox:

vqvae, *priors = MODELS['5b']
raw_to_tokens = vqvae.encode(x.unsqueeze(0)).squeeze(0)
tokens_to_prior = priors[-1].encode(raw_to_tokens.unsqueeze(0)).squeeze(0)

Audiocraft:

model = MusicGen.get_pretrained('melody')
wav = model.generate_unconditional(
    num_samples=1,
    progress=True,
)

Audiocraft offers a more streamlined API for generating music, while Jukebox requires more complex setup and encoding steps. Audiocraft's approach is more user-friendly and easier to integrate into projects, but Jukebox provides more fine-grained control over the generation process.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

AudioCraft

docs badge linter badge tests badge

AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.

Installation

AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
python -m pip install 'torch==2.1.0'
# You might need the following before trying to install the packages
python -m pip install setuptools wheel
# Then proceed to one of the following
python -m pip install -U audiocraft  # stable release
python -m pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
python -m pip install -e .  # or if you cloned the repo locally (mandatory if you want to train).
python -m pip install -e '.[wm]'  # if you want to train a watermarking model

We also recommend having ffmpeg installed, either through your system or Anaconda:

sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge

Models

At the moment, AudioCraft contains the training code and inference code for:

  • MusicGen: A state-of-the-art controllable text-to-music model.
  • AudioGen: A state-of-the-art text-to-sound model.
  • EnCodec: A state-of-the-art high fidelity neural audio codec.
  • Multi Band Diffusion: An EnCodec compatible decoder using diffusion.
  • MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.
  • AudioSeal: A state-of-the-art audio watermarking.

Training code

AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models. For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to the AudioCraft training documentation.

For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model that provides pointers to configuration, example grids and model/task-specific information and FAQ.

API documentation

We provide some API documentation for AudioCraft.

FAQ

Is the training code available?

Yes! We provide the training code for EnCodec, MusicGen and Multi Band Diffusion.

Where are the models stored?

Hugging Face stored the model in a specific location, which can be overridden by setting the AUDIOCRAFT_CACHE_DIR environment variable for the AudioCraft models. In order to change the cache location of the other Hugging Face models, please check out the Hugging Face Transformers documentation for the cache setup. Finally, if you use a model that relies on Demucs (e.g. musicgen-melody) and want to change the download location for Demucs, refer to the Torch Hub documentation.

License

  • The code in this repository is released under the MIT license as found in the LICENSE file.
  • The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.

Citation

For the general framework of AudioCraft, please cite the following.

@inproceedings{copet2023simple,
    title={Simple and Controllable Music Generation},
    author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
}

When referring to a specific model, please cite as mentioned in the model specific README, e.g ./docs/MUSICGEN.md, ./docs/AUDIOGEN.md, etc.