essentia
C++ library for audio and music analysis, description and synthesis, including Python bindings
Top Related Projects
Python library for audio and music analysis
a library for audio and music analysis
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Magenta: Music and Art Generation with Machine Intelligence
Data manipulation and transformation for audio signal processing, powered by PyTorch
Quick Overview
Essentia is an open-source C++ library for audio analysis and music information retrieval. It provides a comprehensive set of algorithms for extracting features from audio signals, including spectral, temporal, and high-level descriptors. Essentia is designed to be efficient, modular, and easy to use, making it suitable for both research and commercial applications.
Pros
- Extensive collection of audio analysis algorithms and music information retrieval tools
- High performance and efficiency due to C++ implementation
- Cross-platform compatibility (Linux, macOS, Windows)
- Python bindings for easier integration and prototyping
Cons
- Steep learning curve for beginners due to its comprehensive nature
- Limited documentation for some advanced features
- Requires C++ knowledge for optimal usage and customization
- Dependency management can be challenging for some users
Code Examples
- Loading an audio file and computing its spectrum:
import essentia.standard as es
# Load audio file
audio = es.MonoLoader(filename='audio.wav', sampleRate=44100)()
# Compute spectrum
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
spec = spectrum(w(audio))
- Extracting the beat positions from an audio file:
import essentia.standard as es
# Load audio file
audio = es.MonoLoader(filename='audio.wav', sampleRate=44100)()
# Extract beat positions
rhythm_extractor = es.RhythmExtractor2013()
bpm, beats, beats_confidence, _, beats_intervals = rhythm_extractor(audio)
print(f"BPM: {bpm}")
print(f"Beat positions: {beats}")
- Computing MFCCs (Mel-Frequency Cepstral Coefficients):
import essentia.standard as es
# Load audio file
audio = es.MonoLoader(filename='audio.wav', sampleRate=44100)()
# Compute MFCCs
window = es.Windowing(type='hann')
spectrum = es.Spectrum()
mfcc = es.MFCC()
mfccs = []
for frame in es.FrameGenerator(audio, frameSize=1024, hopSize=512):
spec = spectrum(window(frame))
mfcc_bands, mfcc_coeffs = mfcc(spec)
mfccs.append(mfcc_coeffs)
Getting Started
To get started with Essentia, follow these steps:
-
Install Essentia using pip:
pip install essentia
-
Import the library in your Python script:
import essentia.standard as es
-
Load an audio file and perform basic analysis:
# Load audio file audio = es.MonoLoader(filename='audio.wav', sampleRate=44100)() # Compute basic features duration = len(audio) / 44100 loudness = es.Loudness()(audio) pitch, confidence = es.PitchYinFFT()(es.Spectrum()(es.Windowing()(audio))) print(f"Duration: {duration:.2f} seconds") print(f"Loudness: {loudness:.2f} dB") print(f"Pitch: {pitch:.2f} Hz (confidence: {confidence:.2f})")
For more advanced usage and detailed documentation, refer to the official Essentia website and GitHub repository.
Competitor Comparisons
Python library for audio and music analysis
Pros of librosa
- Easier to install and use, with fewer dependencies
- More Pythonic API and better integration with NumPy and SciPy
- Extensive documentation and tutorials for beginners
Cons of librosa
- Slower performance for some operations compared to Essentia
- Limited support for real-time processing
- Fewer advanced audio analysis features
Code Comparison
librosa:
import librosa
y, sr = librosa.load('audio.wav')
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
mfcc = librosa.feature.mfcc(y=y, sr=sr)
Essentia:
import essentia.standard as es
audio = es.MonoLoader(filename='audio.wav')()
rhythm_extractor = es.RhythmExtractor2013()
bpm, beats, _, _ = rhythm_extractor(audio)
mfcc = es.MFCC()(audio)
Both libraries offer similar functionality for basic audio analysis tasks, but Essentia provides more low-level control and advanced features. librosa is generally more user-friendly and better suited for quick prototyping and research, while Essentia is more powerful for complex audio processing tasks and real-time applications.
a library for audio and music analysis
Pros of aubio
- Lightweight and efficient, with a focus on real-time processing
- Supports multiple programming languages through bindings
- Extensive command-line tools for quick audio analysis
Cons of aubio
- Smaller feature set compared to Essentia
- Less active development and community support
- Limited documentation and examples for advanced use cases
Code Comparison
Essentia:
import essentia.standard as es
audio = es.MonoLoader(filename='audio.wav')()
beats = es.BeatTrackerMultiFeature()(audio)
aubio:
import aubio
source = aubio.source('audio.wav')
tempo = aubio.tempo("default", 1024, 512, source.samplerate)
beats = []
while True:
samples, read = source()
is_beat = tempo(samples)
if is_beat:
beats.append(tempo.get_last_s())
if read < source.hop_size:
break
Both libraries offer beat tracking functionality, but Essentia provides a more straightforward API for this task. aubio requires more manual setup and iteration, which can offer greater flexibility but may be less convenient for simple use cases.
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Pros of Annoy
- Specialized for approximate nearest neighbor search, making it highly efficient for this specific task
- Lightweight and easy to integrate into existing projects
- Supports multiple distance metrics (Euclidean, Manhattan, Cosine, etc.)
Cons of Annoy
- Limited to a single specific task, unlike Essentia's broader audio analysis capabilities
- Less suitable for complex audio processing tasks or feature extraction
- Smaller community and fewer contributors compared to Essentia
Code Comparison
Annoy (C++ with Python bindings):
from annoy import AnnoyIndex
t = AnnoyIndex(f, 'angular')
for i in range(1000):
v = [random.gauss(0, 1) for z in range(f)]
t.add_item(i, v)
t.build(10)
Essentia (C++ with Python bindings):
import essentia.standard as es
audio = es.MonoLoader(filename='audio.wav')()
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
mfcc = es.MFCC()
for frame in es.FrameGenerator(audio, frameSize=1024, hopSize=512):
mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))
This comparison highlights the specialized nature of Annoy for nearest neighbor search, while Essentia offers a broader range of audio analysis tools. The code examples demonstrate Annoy's focus on indexing and searching, versus Essentia's audio processing capabilities.
Magenta: Music and Art Generation with Machine Intelligence
Pros of Magenta
- Focuses on machine learning for music and art generation
- Integrates well with TensorFlow and other Google AI tools
- Offers pre-trained models for quick experimentation
Cons of Magenta
- Narrower scope, primarily for creative AI applications
- Steeper learning curve for those not familiar with machine learning
- Less comprehensive audio analysis capabilities
Code Comparison
Magenta (Python):
import magenta
sequence = magenta.music.midi_io.midi_file_to_sequence_proto(midi_file)
notes = magenta.music.sequences_lib.extract_notes(sequence)
Essentia (C++):
#include <essentia/algorithmfactory.h>
#include <essentia/essentiamath.h>
AlgorithmFactory& factory = AlgorithmFactory::instance();
Algorithm* loader = factory.create("MonoLoader", "filename", audiofile);
Key Differences
Magenta is tailored for AI-driven music creation and artistic applications, while Essentia provides a broader set of audio analysis tools. Magenta leverages machine learning techniques, particularly with TensorFlow, whereas Essentia focuses on signal processing and feature extraction from audio.
Essentia offers more low-level control and a wider range of audio analysis algorithms, making it suitable for various audio processing tasks. Magenta, on the other hand, excels in generative tasks and creative applications of AI in music and art.
Data manipulation and transformation for audio signal processing, powered by PyTorch
Pros of pytorch/audio
- Seamless integration with PyTorch ecosystem for deep learning tasks
- Extensive GPU acceleration support for faster processing
- Comprehensive documentation and active community support
Cons of pytorch/audio
- Steeper learning curve for users not familiar with PyTorch
- More focused on deep learning applications, less versatile for general audio processing
- Larger memory footprint due to PyTorch dependencies
Code Comparison
essentia:
#include <essentia/algorithmfactory.h>
#include <essentia/essentiamath.h>
AlgorithmFactory& factory = AlgorithmFactory::instance();
Algorithm* mfcc = factory.create("MFCC");
pytorch/audio:
import torchaudio
waveform, sample_rate = torchaudio.load("audio.wav")
mfcc = torchaudio.transforms.MFCC(sample_rate=sample_rate)
mfcc_features = mfcc(waveform)
Both libraries offer MFCC extraction, but essentia uses C++ with a factory pattern, while pytorch/audio leverages Python and integrates seamlessly with PyTorch tensors. essentia provides more low-level control, while pytorch/audio offers a more streamlined approach for deep learning applications.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Essentia
Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPLv3 license. It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. The library is also wrapped in Python and includes a number of predefined executable extractors for the available music descriptors, which facilitates its use for fast prototyping and allows setting up research experiments very rapidly. Furthermore, it includes a Vamp plugin to be used with Sonic Visualiser for visualization purposes. Essentia is designed with a focus on the robustness of the provided music descriptors and is optimized in terms of the computational cost of the algorithms. The provided functionality, specifically the music descriptors included in-the-box and signal processing algorithms, is easily expandable and allows for both research experiments and development of large-scale industrial applications.
Documentation online: http://essentia.upf.edu
Installation
The library is cross-platform and currently supports Linux, macOS, Windows, iOS and Android systems. Read installation instructions:
Install from master for the latest updates.
To use in Python (Linux x86_64
, i686
): pip install essentia
or pip install essentia-tensorflow
.
Docker images: https://hub.docker.com/r/mtgupf/essentia/
You can download and use prebuilt static binaries for a number of Essentia's command-line music extractors instead of installing the complete library
Quick start
Quick start using Python:
- http://essentia.upf.edu/documentation/essentia_python_tutorial.html
- Jupyter Notebook Essentia tutorial
Command-line tools to compute common music descriptors:
Asking for help
Read frequently asked questions.
Create an issue on github or open a new discussion if your question was not answered before.
Versions
Official releases: https://github.com/MTG/essentia/releases
Github branches:
- master: latest updates; if you got any problem, try it first.
If you use example extractors (located in src/examples), or your own code employing Essentia algorithms to compute descriptors, you should be aware of possible incompatibilities when using different versions of Essentia.
How to contribute
We are more than happy to collaborate and receive your contributions to Essentia. The best practice of submitting your code is by creating pull requests to our GitHub repository following our contribution policy. By submitting your code you authorize that it complies with the Developer's Certificate of Origin. For more details see: http://essentia.upf.edu/documentation/contribute.html
You are also more than welcome to suggest any improvements, including proposals for new algorithms, etc.
Top Related Projects
Python library for audio and music analysis
a library for audio and music analysis
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Magenta: Music and Art Generation with Machine Intelligence
Data manipulation and transformation for audio signal processing, powered by PyTorch
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot