basic-pitch

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

4,116

354

4,116

View on GitHub

Top Related Projects

librosa

7,777

Python library for audio and music analysis

aubio

3,478

a library for audio and music analysis

essentia

3,106

C++ library for audio and music analysis, description and synthesis, including Python bindings

crepe

1,257

CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)

madmom

1,452

Python audio and music signal processing library

Quick Overview

Basic Pitch is an open-source Python library and command-line tool developed by Spotify for music transcription. It uses deep learning to convert raw audio into MIDI, providing pitch and timing information for notes in the audio. The project aims to make music transcription accessible and accurate for various applications.

Pros

High accuracy in pitch detection and note onset/offset timing
Supports both monophonic and polyphonic audio transcription
Provides both a Python library and a command-line interface for flexibility
Open-source and actively maintained by Spotify

Cons

May require significant computational resources for processing large audio files
Limited to pitch and timing information; doesn't transcribe other musical elements like dynamics or articulation
Potential for occasional errors in complex polyphonic passages or with certain instrument timbres

Code Examples

Transcribing an audio file to MIDI:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

midi_data, _ = predict(ICASSP_2022_MODEL_PATH, "path/to/audio.wav")
midi_data.write("output.mid")

Extracting note events from audio:

from basic_pitch.inference import predict_and_save

audio_path = "path/to/audio.wav"
output_dir = "output/"

predict_and_save(audio_path, output_dir, save_midi=True, save_note_events=True)

Customizing transcription parameters:

from basic_pitch.inference import predict

midi_data, _ = predict(
    ICASSP_2022_MODEL_PATH,
    "path/to/audio.wav",
    onset_threshold=0.5,
    frame_threshold=0.3,
    minimum_note_length=58,
    minimum_frequency=30,
    maximum_frequency=1000
)

Getting Started

To get started with Basic Pitch, follow these steps:

Install the library:
```
pip install basic-pitch
```

Import and use the library in your Python script:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

midi_data, _ = predict(ICASSP_2022_MODEL_PATH, "path/to/your/audio.wav")
midi_data.write("transcribed_output.mid")

For command-line usage, run:
```
basic-pitch "path/to/your/audio.wav"
```

This will generate MIDI and note event files in the current directory.

Competitor Comparisons

librosa

7,777

Python library for audio and music analysis

Pros of librosa

More comprehensive audio analysis toolkit with a wider range of features
Well-established library with extensive documentation and community support
Flexible and customizable for various audio processing tasks

Cons of librosa

Slower processing speed for certain tasks compared to Basic Pitch
May require more setup and configuration for specific use cases
Less specialized for pitch detection and music transcription

Code Comparison

librosa:

import librosa

y, sr = librosa.load('audio.wav')
pitches, magnitudes = librosa.piptrack(y=y, sr=sr)

Basic Pitch:

from basic_pitch.inference import predict

model_output, midi_data, note_events = predict('audio.wav')

Summary

librosa is a versatile audio analysis library with a wide range of features, while Basic Pitch focuses specifically on pitch detection and music transcription. librosa offers more flexibility and customization options, but Basic Pitch may provide faster processing for its specialized tasks. The choice between the two depends on the specific requirements of your audio processing project.

aubio

3,478

a library for audio and music analysis

Pros of aubio

More comprehensive audio analysis toolkit with multiple features beyond pitch detection
Longer development history and established community support
Supports multiple programming languages and platforms

Cons of aubio

May have a steeper learning curve due to its broader scope
Potentially slower performance for specific tasks compared to specialized libraries

Code Comparison

basic-pitch example:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

model_output, midi_data, note_events = predict(audio_path)

aubio example:

import aubio

pitch_o = aubio.pitch("yin", 2048, 2048, 44100)
samples, read = source()
pitch = pitch_o(samples)[0]

Summary

Basic Pitch is a specialized tool for monophonic pitch detection, while aubio is a more comprehensive audio analysis library. Basic Pitch may be easier to use for specific pitch-related tasks, while aubio offers a wider range of audio processing capabilities. The choice between the two depends on the specific requirements of your project and the breadth of audio analysis features needed.

essentia

3,106

C++ library for audio and music analysis, description and synthesis, including Python bindings

Pros of Essentia

Broader scope: Essentia is a comprehensive audio analysis library covering various aspects of music information retrieval, not limited to pitch detection
More mature project: Essentia has been in development for longer, offering a wider range of features and algorithms
Flexibility: Supports multiple programming languages (C++, Python, JavaScript) and can be used in different environments

Cons of Essentia

Steeper learning curve: Due to its extensive feature set, Essentia may be more complex to use for beginners
Heavier resource usage: The comprehensive nature of Essentia may result in higher computational requirements compared to Basic Pitch

Code Comparison

Basic Pitch (Python):

from basic_pitch.inference import predict
audio_path = "path/to/audio.wav"
model_output, midi_data, note_events = predict(audio_path)

Essentia (Python):

import essentia.standard as es
audio = es.MonoLoader(filename="path/to/audio.wav")()
pitch, confidence = es.PredominantPitchMelodia()(audio)

Both libraries offer pitch detection functionality, but Essentia's implementation is part of a larger ecosystem of audio analysis tools. Basic Pitch focuses specifically on pitch detection and note transcription, while Essentia provides a more comprehensive set of audio analysis algorithms.

crepe

1,257

CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)

Pros of CREPE

Higher accuracy in pitch estimation, especially for monophonic audio
Faster inference time on GPU
More extensive documentation and academic research backing

Cons of CREPE

Larger model size, requiring more computational resources
Limited to monophonic pitch estimation
Less integrated with music production workflows

Code Comparison

Basic-pitch example:

from basic_pitch.inference import predict
model_output, midi_data, note_events = predict('audio.wav')

CREPE example:

import crepe
time, frequency, confidence, activation = crepe.predict('audio.wav')

Both repositories focus on pitch estimation from audio, but they have different approaches and use cases. Basic-pitch is designed for polyphonic audio and integrates well with music production tools, while CREPE excels in monophonic pitch estimation with higher accuracy. Basic-pitch offers a more comprehensive solution for music analysis and transcription, including MIDI output, while CREPE provides raw pitch and confidence values. The choice between the two depends on the specific requirements of the project, such as polyphonic vs. monophonic analysis, integration needs, and available computational resources.

madmom

1,452

Python audio and music signal processing library

Pros of madmom

Broader range of music information retrieval (MIR) tasks, including beat tracking, onset detection, and chord recognition
More established project with a longer history and wider community adoption
Provides both high-level and low-level interfaces for flexibility in usage

Cons of madmom

Less focused on pitch detection compared to Basic Pitch
May have a steeper learning curve due to its broader scope
Potentially slower processing speed for specific tasks like pitch detection

Code Comparison

madmom example:

from madmom.features.beats import RNNBeatProcessor
processor = RNNBeatProcessor()
beats = processor(audio_file)

Basic Pitch example:

from basic_pitch.inference import predict
model_output, midi_data, note_events = predict(audio_file_path)

Summary

madmom is a comprehensive MIR library offering a wide range of audio analysis tools, while Basic Pitch focuses specifically on pitch detection. madmom provides more flexibility and broader functionality, but may be more complex to use. Basic Pitch offers a simpler, more targeted approach to pitch detection, potentially with faster processing for this specific task.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Basic Pitch Logo

PyPI - Python Version

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence Lab. It's small, easy-to-use, pip install-able and npm install-able via its sibling repo.

Basic Pitch may be simple, but it's is far from "basic"! basic-pitch is efficient and easy to use, and its multipitch support, its ability to generalize across instruments, and its note accuracy competes with much larger and more resource-hungry AMT systems.

Provide a compatible audio file and basic-pitch will generate a MIDI file, complete with pitch bends. Basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used. Basic pitch works best on one instrument at a time.

Research Paper

This library was released in conjunction with Spotify's publication at ICASSP 2022. You can read more about this research in the paper, A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation.

If you use this library in academic research, consider citing it:

@inproceedings{2022_BittnerBRME_LightweightNoteTranscription_ICASSP,
  author= {Bittner, Rachel M. and Bosch, Juan Jos\'e and Rubinstein, David and Meseguer-Brocal, Gabriel and Ewert, Sebastian},
  title= {A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation},
  booktitle= {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  address= {Singapore},
  year= 2022,
}

Note that we have improved Basic Pitch beyond what was presented in this paper. Therefore, if you use the output of Basic Pitch in academic research, we recommend that you cite the version of the code that was used.

Demo

If, for whatever reason, you're not yet completely inspired, or you're just like so totally over the general vibe and stuff, checkout our snappy demo website, basicpitch.io, to experiment with our model on whatever music audio you provide!

Installation

basic-pitch is available via PyPI. To install the current release:

pip install basic-pitch

To update Basic Pitch to the latest version, add --upgrade to the above command.

Compatible Environments:

MacOS, Windows and Ubuntu operating systems
Python versions 3.7, 3.8, 3.9, 3.10, 3.11
For Mac M1 hardware, we currently only support python version 3.10. Otherwise, we suggest using a virtual machine.

Model Runtime

Basic Pitch comes with the original TensorFlow model and the TensorFlow model converted to CoreML, TensorFlowLite, and ONNX. By default, Basic Pitch will not install TensorFlow as a dependency unless you are using Python>=3.11. Instead, by default, CoreML will be installed on MacOS, TensorFlowLite will be installed on Linux and ONNX will be installed on Windows. If you want to install TensorFlow along with the default model inference runtime, you can install TensorFlow via pip install basic-pitch[tf].

Usage

Model Prediction

Model Runtime

By default, Basic Pitch will attempt to load a model in the following order:

TensorFlow
CoreML
TensorFlowLite
ONNX

Additionally, the module variable ICASSP_2022_MODEL_PATH will default to the first available version in the list.

We will explain how to override this priority list below. Because all other model serializations were converted from TensorFlow, we recommend using TensorFlow when possible. N.B. Basic Pitch does not install TensorFlow by default to save the user time when installing and running Basic Pitch.

Command Line Tool

This library offers a command line tool interface. A basic prediction command will generate and save a MIDI file transcription of audio at the <input-audio-path> to the <output-directory>:

basic-pitch <output-directory> <input-audio-path>

For example:

basic-pitch /output/directory/path /input/audio/path

To process more than one audio file at a time:

basic-pitch <output-directory> <input-audio-path-1> <input-audio-path-2> <input-audio-path-3>

Optionally, you may append any of the following flags to your prediction command to save additional formats of the prediction output to the <output-directory>:

--sonify-midi to additionally save a .wav audio rendering of the MIDI file.
--save-model-outputs to additionally save raw model outputs as an NPZ file.
--save-note-events to additionally save the predicted note events as a CSV file.

If you want to use a non-default model type (e.g., use CoreML instead of TF), use the --model-serialization argument. The CLI will change the loaded model to the type you prefer.

To discover more parameter control, run:

basic-pitch --help

Programmatic

predict()

Import basic-pitch into your own Python code and run the predict functions directly, providing an <input-audio-path> and returning the model's prediction results:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

model_output, midi_data, note_events = predict(<input-audio-path>)

<minimum-frequency> & <maximum-frequency> (floats) set the maximum and minimum allowed note frequency, in Hz, returned by the model. Pitch events with frequencies outside of this range will be excluded from the prediction results.
model_output is the raw model inference output
midi_data is the transcribed MIDI data derived from the model_output
note_events is a list of note events derived from the model_output

Note: As mentioned previously, ICASSP_2022_MODEL_PATH will default to the runtime first supported in the list TensorFlow, CoreML, TensorFlowLite, ONNX.

predict() in a loop

To run prediction within a loop, you'll want to load the model yourself and provide predict() with the loaded model object itself to be used for repeated prediction calls, in order to avoid redundant and sluggish model loading.

import tensorflow as tf

from basic_pitch.inference import predict, Model
from basic_pitch import ICASSP_2022_MODEL_PATH

basic_pitch_model = Model(ICASSP_2022_MODEL_PATH))

for x in range():
    ...
    model_output, midi_data, note_events = predict(
        <loop-x-input-audio-path>,
        basic_pitch_model,
    )
    ...

predict_and_save()

If you would like basic-pitch orchestrate the generation and saving of our various supported output file types, you may use predict_and_save instead of using predict directly:

from basic_pitch.inference import predict_and_save

predict_and_save(
    <input-audio-path-list>,
    <output-directory>,
    <save-midi>,
    <sonify-midi>,
    <save-model-outputs>,
    <save-notes>,
)

where:

<input-audio-path-list> & <output-directory>
- directory paths for basic-pitch to read from/write to.
<save-midi>
- bool to control generating and saving a MIDI file to the <output-directory>
<sonify-midi>
- bool to control saving a WAV audio rendering of the MIDI file to the <output-directory>
<save-model-outputs>
- bool to control saving the raw model output as a NPZ file to the <output-directory>
<save-notes>
- bool to control saving predicted note events as a CSV file <output-directory>

Model Input

Supported Audio Codecs

basic-pitch accepts all sound files that are compatible with its version of librosa, including:

.mp3
.ogg
.wav
.flac
.m4a

Mono Channel Audio Only

While you may use stereo audio as an input to our model, at prediction time, the channels of the input will be down-mixed to mono, and then analyzed and transcribed.

File Size/Audio Length

This model can process any size or length of audio, but processing of larger/longer audio files could be limited by your machine's available disk space. To process these files, we recommend streaming the audio of the file, processing windows of audio at a time.

Sample Rate

Input audio maybe be of any sample rate, however, all audio will be resampled to 22050 Hz before processing.

VST

Thanks to DamRsn for developing this working VST version of basic-pitch! - https://github.com/DamRsn/NeuralNote

Contributing

Contributions to basic-pitch are welcomed! See CONTRIBUTING.md for details.

Copyright and License

This software is licensed under the Apache License, Version 2.0 (the "Apache License"). You may choose either license to govern your use of this software only upon the condition that you accept all of the terms of either the Apache License.

You may obtain a copy of the Apache License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the Apache License or the GPL License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Apache License for the specific language governing permissions and limitations under the Apache License.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot