Top Related Projects
Python audio and music signal processing library
Python library for audio and music analysis
C++ library for audio and music analysis, description and synthesis, including Python bindings
a library for audio and music analysis
🎛 🔊 A Python library for audio.
Quick Overview
nnAudio is a PyTorch-based audio processing library that implements various audio processing methods as neural network layers. It allows for GPU acceleration of audio signal processing tasks, making it particularly useful for deep learning applications involving audio data.
Pros
- GPU acceleration for faster audio processing
- Seamless integration with PyTorch neural networks
- Implements various audio transforms as neural network layers
- Supports real-time audio processing
Cons
- Requires PyTorch, which may not be suitable for all projects
- Limited to GPU-enabled environments for optimal performance
- May have a steeper learning curve compared to traditional audio processing libraries
- Documentation could be more comprehensive for some advanced features
Code Examples
- Loading and using a Short-time Fourier Transform (STFT) layer:
import torch
from nnAudio import Spectrogram
# Create an STFT layer
stft = Spectrogram.STFT(n_fft=2048, hop_length=512)
# Generate a random audio signal
audio = torch.randn(1, 44100)
# Apply STFT
spectrogram = stft(audio)
- Creating a Mel spectrogram:
from nnAudio import Spectrogram
# Create a Mel spectrogram layer
mel_spec = Spectrogram.MelSpectrogram(sr=22050, n_fft=2048, n_mels=128)
# Generate a random audio signal
audio = torch.randn(1, 22050)
# Compute Mel spectrogram
mel_spectrogram = mel_spec(audio)
- Applying a Constant-Q Transform:
from nnAudio import Spectrogram
# Create a Constant-Q Transform layer
cqt = Spectrogram.CQT(sr=22050, hop_length=512, n_bins=84, bins_per_octave=12)
# Generate a random audio signal
audio = torch.randn(1, 22050)
# Compute CQT
cqt_spectrogram = cqt(audio)
Getting Started
To get started with nnAudio, follow these steps:
-
Install nnAudio using pip:
pip install nnAudio
-
Import the necessary modules in your Python script:
import torch from nnAudio import Spectrogram
-
Create an audio processing layer and apply it to your audio data:
# Create a Mel spectrogram layer mel_spec = Spectrogram.MelSpectrogram(sr=22050, n_fft=2048, n_mels=128) # Load or generate your audio data audio = torch.randn(1, 22050) # Replace with your actual audio data # Compute Mel spectrogram mel_spectrogram = mel_spec(audio)
-
Use the processed audio data in your PyTorch neural network or further analysis.
Competitor Comparisons
Python audio and music signal processing library
Pros of madmom
- More comprehensive audio analysis toolkit with features like beat tracking, onset detection, and chord recognition
- Established project with longer development history and wider adoption in MIR research
- Supports both offline and real-time processing of audio signals
Cons of madmom
- Less focus on GPU acceleration and deep learning integration
- Steeper learning curve due to more complex architecture and broader feature set
- Slower development pace with less frequent updates
Code Comparison
madmom example:
from madmom.features.beats import RNNBeatProcessor
processor = RNNBeatProcessor()
beats = processor(audio_file)
nnAudio example:
from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spectrogram = spec_layer(audio_tensor)
madmom provides higher-level audio analysis functions, while nnAudio focuses on efficient spectrogram computation using neural network layers. nnAudio's approach allows for easier integration with deep learning pipelines and GPU acceleration.
Python library for audio and music analysis
Pros of librosa
- Comprehensive audio processing library with a wide range of features
- Well-established and widely used in the audio research community
- Extensive documentation and examples available
Cons of librosa
- CPU-based processing, which can be slower for large-scale operations
- Not optimized for GPU acceleration or deep learning integration
Code Comparison
librosa:
import librosa
y, sr = librosa.load('audio.wav')
stft = librosa.stft(y)
mel_spec = librosa.feature.melspectrogram(y=y, sr=sr)
nnAudio:
from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
mel_layer = Spectrogram.MelSpectrogram()
stft = spec_layer(audio_tensor)
mel_spec = mel_layer(audio_tensor)
Key Differences
- nnAudio is designed for GPU acceleration and seamless integration with PyTorch
- librosa offers a broader range of audio processing functions
- nnAudio focuses on spectrogram generation as neural network layers
- librosa provides more flexibility for customizing audio analysis parameters
- nnAudio is better suited for deep learning workflows, while librosa excels in general audio processing tasks
Both libraries have their strengths, and the choice between them depends on the specific requirements of your audio processing project and whether GPU acceleration is necessary.
C++ library for audio and music analysis, description and synthesis, including Python bindings
Pros of essentia
- Comprehensive library with a wide range of audio analysis algorithms
- Supports multiple programming languages (C++, Python, JavaScript)
- Well-documented and actively maintained by a research institution
Cons of essentia
- Steeper learning curve due to its extensive feature set
- Requires more setup and configuration compared to nnAudio
- May be overkill for simple audio processing tasks
Code comparison
nnAudio (PyTorch-based):
from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spec = spec_layer(audio_tensor)
essentia (Python bindings):
import essentia.standard as es
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
spectrogram = es.SpectrumCQ()
Summary
essentia is a comprehensive audio analysis library with support for multiple languages, making it suitable for complex audio processing tasks. However, it may have a steeper learning curve and require more setup than nnAudio. nnAudio, being PyTorch-based, offers a simpler interface for basic audio processing tasks, especially when working with neural networks. The choice between the two depends on the specific requirements of your project and your familiarity with audio processing concepts.
a library for audio and music analysis
Pros of aubio
- Mature and well-established library with a long history of development
- Supports a wide range of audio analysis tasks beyond spectrograms
- Offers bindings for multiple programming languages
Cons of aubio
- Less focused on neural network-based audio processing
- May have a steeper learning curve for users primarily interested in spectrogram generation
- Potentially slower for certain operations compared to GPU-accelerated alternatives
Code Comparison
aubio:
import aubio
win_s = 512
hop_s = win_s // 2
src = aubio.source('input.wav', samplerate=44100, hop_size=hop_s)
pv = aubio.pvoc(win_s, hop_s)
specgram = []
nnAudio:
import torch
from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT(n_fft=512, hop_length=256)
audio = torch.from_numpy(audio_array)
spectrogram = spec_layer(audio)
Both libraries offer spectrogram generation, but nnAudio is more focused on integrating with neural network workflows using PyTorch, while aubio provides a broader set of audio analysis tools. nnAudio may be more suitable for deep learning projects, while aubio excels in traditional audio processing tasks.
🎛 🔊 A Python library for audio.
Pros of pedalboard
- Broader audio processing capabilities, including effects like reverb and distortion
- Designed for real-time audio processing, suitable for live applications
- Backed by Spotify, potentially offering better long-term support and updates
Cons of pedalboard
- Less focused on neural network audio processing tasks
- May have a steeper learning curve for users primarily interested in audio analysis
Code Comparison
nnAudio example:
from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spec = spec_layer(audio_tensor)
pedalboard example:
from pedalboard import Pedalboard, Reverb, Distortion
board = Pedalboard([Reverb(), Distortion()])
effected = board(audio_array, sample_rate)
Summary
nnAudio focuses on audio analysis and spectrograms for neural networks, while pedalboard offers a wider range of audio processing tools. nnAudio may be more suitable for researchers working on audio-based machine learning tasks, whereas pedalboard is better suited for general audio manipulation and effects processing. The choice between the two depends on the specific requirements of the project and the user's familiarity with audio processing concepts.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
nnAudio
nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.
Other GPU audio processing tools are torchaudio and tf.signal. But they are not using the neural network approach, and hence the Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is still very difficult to install under the Windows environment due to sox
. nnAudio is a more compatible audio processing tool across different operating systems since it relies mostly on PyTorch convolutional neural network. The name of nnAudio comes from torch.nn
Installation
pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation
or
pip install nnAudio==0.3.1
Documentation
https://kinwaicheuk.github.io/nnAudio/index.html
Comparison with other libraries
Feature | nnAudio | torch.stft | kapre | torchaudio | tf.signal | torch-stft | librosa |
---|---|---|---|---|---|---|---|
Trainable | â | â | â | â | â | â | â |
Differentiable | â | â | â | â | â | â | â |
Linear frequency STFT | â | â | â | â | â | â | â |
Logarithmic frequency STFT | â | â | â | â | â | â | â |
Inverse STFT | â | â | â | â | â | â | â |
Griffin-Lim | â | â | â | â | â | â | â |
Mel | â | â | â | â | â | â | â |
MFCC | â | â | â | â | â | â | â |
CQT | â | â | â | â | â | â | â |
VQT | â | â | â | â | â | â | â |
Gammatone | â | â | â | â | â | â | â |
CFP1 | â | â | â | â | â | â | â |
GPU support | â | â | â | â | â | â | â |
â : Fully support âï¸: Developing (only available in dev version) â: Not support
1 Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music
News & Changelog
To view the full changelog, please go to CHANGELOG.md
version 0.3.1 (24 Dec 2021):
- Added VQT feature #113
version 0.3.0 (19 Nov 2021):
- Changed module naming.
nnAudio.Spectrogram
will be replaced bynnAudio.features
in the future releases. Currently, various spectrogram types are accessible via both methods.
How to cite nnAudio
The paper for nnAudio is avaliable on IEEE Access
K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.
BibTex
@ARTICLE{9174990, author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}}, journal={IEEE Access}, title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, year={2020}, volume={8}, number={}, pages={161981-162003}, doi={10.1109/ACCESS.2020.3019084}}
Call for Contributions
nnAudio is a fast-growing package. With the increasing number of feature requests, we welcome anyone who is familiar with digital signal processing and neural network to contribute to nnAudio. The current list of pending features includes:
- Invertible Constant Q Transform (CQT)
(Quick tips for unit test: cd
inside Installation folder, then type pytest
. You need at least 1931 MiB GPU memory to pass all the unit tests)
Alternatively, you may also contribute by:
- Making a better demonstration code or tutorial
Dependencies
Numpy >= 1.14.5
Scipy >= 1.2.0
PyTorch >= 1.6.0 (Griffin-Lim only available after 1.6.0)
Python >= 3.6
librosa = 0.7.0 (Theoretically nnAudio depends on librosa. But we only need to use a single function mel
from librosa.filters
. To save users troubles from installing librosa for this single function, I just copy the chunk of functions corresponding to mel
in my code so that nnAudio runs without the need to install librosa)
Other similar libraries
Top Related Projects
Python audio and music signal processing library
Python library for audio and music analysis
C++ library for audio and music analysis, description and synthesis, including Python bindings
a library for audio and music analysis
🎛 🔊 A Python library for audio.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot