crepe
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
Top Related Projects
Python library for audio and music analysis
a library for audio and music analysis
C++ library for audio and music analysis, description and synthesis, including Python bindings
Python audio and music signal processing library
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Quick Overview
CREPE (Convolutional Representation for Pitch Estimation) is an open-source pitch tracking library implemented in Python. It uses a deep convolutional neural network to estimate the fundamental frequency (F0) of monophonic audio signals, providing accurate pitch detection for various musical and speech applications.
Pros
- High accuracy in pitch estimation, especially for complex audio signals
- Robust performance across different instruments and vocal styles
- Easy to use with a simple Python API
- Supports both real-time and offline processing
Cons
- Requires significant computational resources, especially for real-time processing
- Limited to monophonic audio signals (single pitch at a time)
- Dependency on TensorFlow, which may be overkill for some applications
- Potential latency issues in real-time scenarios due to the complexity of the model
Code Examples
- Basic pitch estimation from an audio file:
import crepe
from scipy.io import wavfile
sr, audio = wavfile.read('audio.wav')
time, frequency, confidence, activation = crepe.predict(audio, sr, model_capacity='full')
- Real-time pitch estimation using PyAudio:
import crepe
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paFloat32, channels=1, rate=16000, input=True, frames_per_buffer=1024)
while True:
data = np.frombuffer(stream.read(1024), dtype=np.float32)
time, frequency, confidence, activation = crepe.predict(data, 16000, center=True, step_size=64)
print(f"Pitch: {frequency[0]:.2f} Hz, Confidence: {confidence[0]:.2f}")
- Saving pitch estimation results to a CSV file:
import crepe
import numpy as np
sr, audio = crepe.load.audio('audio.wav')
time, frequency, confidence, activation = crepe.predict(audio, sr)
np.savetxt('pitch.csv', np.vstack((time, frequency, confidence)).T, delimiter=',', header='time,frequency,confidence')
Getting Started
To get started with CREPE, follow these steps:
-
Install CREPE using pip:
pip install crepe
-
Import the library and use it to estimate pitch:
import crepe from scipy.io import wavfile sr, audio = wavfile.read('path/to/your/audio/file.wav') time, frequency, confidence, activation = crepe.predict(audio, sr)
-
For more advanced usage and options, refer to the CREPE documentation.
Competitor Comparisons
Python library for audio and music analysis
Pros of librosa
- Broader audio processing capabilities beyond pitch detection
- More extensive documentation and community support
- Wider range of audio analysis features (e.g., onset detection, beat tracking)
Cons of librosa
- Less specialized for pitch detection compared to CREPE
- May be slower for pitch estimation tasks
- Requires more setup and dependencies for pitch-specific tasks
Code Comparison
librosa pitch detection:
import librosa
y, sr = librosa.load('audio.wav')
pitches, magnitudes = librosa.piptrack(y=y, sr=sr)
CREPE pitch detection:
import crepe
audio, sr = crepe.load('audio.wav')
time, frequency, confidence, _ = crepe.predict(audio, sr)
Both libraries offer pitch detection capabilities, but CREPE is more focused on this specific task, while librosa provides a broader range of audio processing functions. CREPE may offer more accurate pitch detection in some cases, especially for monophonic audio. However, librosa's versatility makes it a popular choice for various audio analysis tasks beyond pitch detection.
a library for audio and music analysis
Pros of aubio
- Broader audio analysis capabilities beyond pitch detection
- Supports multiple programming languages (C, Python, MATLAB)
- Longer development history and more mature codebase
Cons of aubio
- Less specialized for pitch detection compared to crepe
- May require more setup and configuration for specific use cases
- Potentially slower performance for pitch detection tasks
Code Comparison
aubio (Python):
import aubio
pitch_o = aubio.pitch("yin", 2048, 2048, 44100)
samples, read = source()
pitch = pitch_o(samples)[0]
crepe (Python):
import crepe
time, frequency, confidence, activation = crepe.predict(audio, sr)
Both libraries offer pitch detection functionality, but aubio provides a more flexible approach with various algorithms, while crepe focuses on a single, highly accurate neural network-based method. aubio's code requires more setup, whereas crepe's interface is more straightforward for pitch detection tasks.
C++ library for audio and music analysis, description and synthesis, including Python bindings
Pros of Essentia
- Comprehensive audio analysis library with a wide range of features beyond pitch detection
- Supports multiple programming languages (C++, Python, JavaScript)
- Actively maintained with regular updates and a large community
Cons of Essentia
- More complex setup and usage compared to CREPE
- Larger codebase and dependencies, potentially slower for simple pitch detection tasks
- Steeper learning curve for beginners
Code Comparison
Essentia (Python):
import essentia.standard as es
audio = es.MonoLoader(filename='audio.wav', sampleRate=44100)()
pitch, confidence = es.PredominantPitchMelodia()(audio)
CREPE (Python):
import crepe
audio, sr = librosa.load('audio.wav')
time, frequency, confidence, _ = crepe.predict(audio, sr)
Both libraries offer pitch detection functionality, but Essentia provides a more comprehensive set of audio analysis tools, while CREPE focuses specifically on pitch estimation using deep learning techniques. Essentia's setup is more involved, but it offers greater flexibility for various audio processing tasks. CREPE, on the other hand, provides a simpler interface for pitch detection with potentially higher accuracy in certain scenarios.
Python audio and music signal processing library
Pros of madmom
- Broader range of audio analysis features (onset detection, beat tracking, chord recognition, etc.)
- More comprehensive documentation and examples
- Actively maintained with regular updates
Cons of madmom
- Steeper learning curve due to more complex architecture
- Heavier computational requirements for some features
- Less specialized for pitch estimation compared to CREPE
Code Comparison
madmom example (beat tracking):
from madmom.features.beats import RNNBeatProcessor
from madmom.features.tempo import TempoEstimationProcessor
proc = RNNBeatProcessor()
beats = proc('audio_file.wav')
tempo = TempoEstimationProcessor()(beats)
CREPE example (pitch estimation):
import crepe
from scipy.io import wavfile
sr, audio = wavfile.read('audio_file.wav')
time, frequency, confidence, _ = crepe.predict(audio, sr)
Both libraries offer Python-based audio analysis, but madmom provides a wider range of features while CREPE focuses specifically on pitch estimation. madmom's code structure is more complex, reflecting its broader functionality, while CREPE offers a simpler interface for its specialized task.
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Pros of Basic-pitch
- Supports multi-pitch detection, allowing for polyphonic analysis
- Includes MIDI conversion capabilities out-of-the-box
- Offers a command-line interface for easy integration into workflows
Cons of Basic-pitch
- May have lower pitch accuracy for monophonic audio compared to CREPE
- Requires more computational resources due to its multi-pitch capabilities
- Has a larger model size, which can impact loading times and memory usage
Code Comparison
CREPE:
import crepe
from scipy.io import wavfile
sr, audio = wavfile.read('audio.wav')
time, frequency, confidence, activation = crepe.predict(audio, sr)
Basic-pitch:
from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH
model_output = predict(audio_path='audio.wav', model_path=ICASSP_2022_MODEL_PATH)
Both repositories provide Python interfaces for pitch detection, but Basic-pitch offers additional features like MIDI conversion and multi-pitch analysis. CREPE's implementation is more straightforward for single-pitch detection, while Basic-pitch requires specifying a model path and provides a more comprehensive output structure.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
CREPE Pitch Tracker
CREPE is a monophonic pitch tracker based on a deep convolutional neural network operating directly on the time-domain waveform input. CREPE is state-of-the-art (as of 2018), outperfoming popular pitch trackers such as pYIN and SWIPE:
Further details are provided in the following paper:
CREPE: A Convolutional Representation for Pitch Estimation
Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello.
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
We kindly request that academic publications making use of CREPE cite the aforementioned paper.
Installing CREPE
CREPE is hosted on PyPI. To install, run the following command in your Python environment:
$ pip install --upgrade tensorflow # if you don't already have tensorflow >= 2.0.0
$ pip install crepe
To install the latest version from source clone the repository and from the top-level crepe
folder call:
$ python setup.py install
Using CREPE
Using CREPE from the command line
This package includes a command line utility crepe
and a pre-trained version of the CREPE model for easy use. To estimate the pitch of audio_file.wav
, run:
$ crepe audio_file.wav
or
$ python -m crepe audio_file.wav
The resulting audio_file.f0.csv
contains 3 columns: the first with timestamps (a 10 ms hop size is used by default), the second contains the predicted fundamental frequency in Hz, and the third contains the voicing confidence, i.e. the confidence in the presence of a pitch:
time,frequency,confidence
0.00,185.616,0.907112
0.01,186.764,0.844488
0.02,188.356,0.798015
0.03,190.610,0.746729
0.04,192.952,0.771268
0.05,195.191,0.859440
0.06,196.541,0.864447
0.07,197.809,0.827441
0.08,199.678,0.775208
...
Timestamps
CREPE uses 10-millisecond time steps by default, which can be adjusted using
the --step-size
option, which takes the size of the time step in millisecond.
For example, --step-size 50
will calculate pitch for every 50 milliseconds.
Following the convention adopted by popular audio processing libraries such as
Essentia and Librosa,
from v0.0.5 onwards CREPE will pad the input signal such that the first frame
is zero-centered (the center of the frame corresponds to time 0) and generally
all frames are centered around their corresponding timestamp, i.e. frame
D[:, t]
is centered at audio[t * hop_length]
. This behavior can be changed
by specifying the optional --no-centering
flag, in which case the first frame
will start at time zero and generally frame D[:, t]
will begin at
audio[t * hop_length]
. Sticking to the default behavior (centered frames) is
strongly recommended to avoid misalignment with features and annotations produced
by other common audio processing tools.
Model Capacity
CREPE uses the model size that was reported in the paper by default, but can optionally
use a smaller model for computation speed, at the cost of slightly lower accuracy.
You can specify --model-capacity {tiny|small|medium|large|full}
as the command
line option to select a model with desired capacity.
Temporal smoothing
By default CREPE does not apply temporal smoothing to the pitch curve, but
Viterbi smoothing is supported via the optional --viterbi
command line argument.
Saving the activation matrix
The script can also optionally save the output activation matrix of the model
to an npy file (--save-activation
), where the matrix dimensions are
(n_frames, 360) using a hop size of 10 ms (there are 360 pitch bins covering 20
cents each).
The script can also output a plot of the activation matrix (--save-plot
),
saved to audio_file.activation.png
including an optional visual representation
of the model's voicing detection (--plot-voicing
). Here's an example plot of
the activation matrix (without the voicing overlay) for an excerpt of male
singing voice:
Batch processing
For batch processing of files, you can provide a folder path instead of a file path:
$ python crepe.py audio_folder
The script will process all WAV files found inside the folder.
Additional usage information
For more information on the usage, please refer to the help message:
$ python crepe.py --help
Using CREPE inside Python
CREPE can be imported as module to be used directly in Python. Here's a minimal example:
import crepe
from scipy.io import wavfile
sr, audio = wavfile.read('/path/to/audiofile.wav')
time, frequency, confidence, activation = crepe.predict(audio, sr, viterbi=True)
Argmax-local Weighted Averaging
This release of CREPE uses the following weighted averaging formula, which is slightly different from the paper. This only focuses on the neighborhood around the maximum activation, which is shown to further improve the pitch accuracy:
Please Note
- The current version only supports WAV files as input.
- The model is trained on 16 kHz audio, so if the input audio has a different sample rate, it will be first resampled to 16 kHz using resampy.
- Due to the subtle numerical differences between frameworks, Keras should be configured to use the TensorFlow backend for the best performance. The model was trained using Keras 2.1.5 and TensorFlow 1.6.0, and the newer versions of TensorFlow seems to work as well.
- Prediction is significantly faster if Keras (and the corresponding backend) is configured to run on GPU.
- The provided model is trained using the following datasets, composed of vocal and instrumental audio, and is therefore expected to work best on this type of audio signals.
- MIR-1K [1]
- Bach10 [2]
- RWC-Synth [3]
- MedleyDB [4]
- MDB-STEM-Synth [5]
- NSynth [6]
References
[1] C.-L. Hsu et al. "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing. 2009.
[2] Z. Duan et al. "Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions", IEEE Transactions on Audio, Speech, and Language Processing. 2010.
[3] M. Mauch et al. "pYIN: A fundamental Frequency Estimator Using Probabilistic Threshold Distributions", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2014.
[4] R. M. Bittner et al. "MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research", Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. 2014.
[5] J. Salamon et al. "An Analysis/Synthesis Framework for Automatic F0 Annotation of Multitrack Datasets", Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. 2017.
[6] J. Engel et al. "Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders", arXiv preprint: 1704.01279. 2017.
Top Related Projects
Python library for audio and music analysis
a library for audio and music analysis
C++ library for audio and music analysis, description and synthesis, including Python bindings
Python audio and music signal processing library
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot