nnAudio

Audio processing by using pytorch 1D convolution network

1,073

View on GitHub

Top Related Projects

madmom

1,452

Python audio and music signal processing library

librosa

7,578

Python library for audio and music analysis

essentia

3,106

C++ library for audio and music analysis, description and synthesis, including Python bindings

aubio

3,427

a library for audio and music analysis

Quick Overview

nnAudio is a PyTorch-based audio processing library that implements various audio processing methods as neural network layers. It allows for GPU acceleration of audio signal processing tasks, making it particularly useful for deep learning applications involving audio data.

Pros

GPU acceleration for faster audio processing
Seamless integration with PyTorch neural networks
Implements various audio transforms as neural network layers
Supports real-time audio processing

Cons

Requires PyTorch, which may not be suitable for all projects
Limited to GPU-enabled environments for optimal performance
May have a steeper learning curve compared to traditional audio processing libraries
Documentation could be more comprehensive for some advanced features

Code Examples

Loading and using a Short-time Fourier Transform (STFT) layer:

import torch
from nnAudio import Spectrogram

# Create an STFT layer
stft = Spectrogram.STFT(n_fft=2048, hop_length=512)

# Generate a random audio signal
audio = torch.randn(1, 44100)

# Apply STFT
spectrogram = stft(audio)

Creating a Mel spectrogram:

from nnAudio import Spectrogram

# Create a Mel spectrogram layer
mel_spec = Spectrogram.MelSpectrogram(sr=22050, n_fft=2048, n_mels=128)

# Generate a random audio signal
audio = torch.randn(1, 22050)

# Compute Mel spectrogram
mel_spectrogram = mel_spec(audio)

Applying a Constant-Q Transform:

from nnAudio import Spectrogram

# Create a Constant-Q Transform layer
cqt = Spectrogram.CQT(sr=22050, hop_length=512, n_bins=84, bins_per_octave=12)

# Generate a random audio signal
audio = torch.randn(1, 22050)

# Compute CQT
cqt_spectrogram = cqt(audio)

Getting Started

To get started with nnAudio, follow these steps:

Install nnAudio using pip:
```
pip install nnAudio
```

Import the necessary modules in your Python script:

import torch
from nnAudio import Spectrogram

Create an audio processing layer and apply it to your audio data:

# Create a Mel spectrogram layer
mel_spec = Spectrogram.MelSpectrogram(sr=22050, n_fft=2048, n_mels=128)

# Load or generate your audio data
audio = torch.randn(1, 22050)  # Replace with your actual audio data

# Compute Mel spectrogram
mel_spectrogram = mel_spec(audio)

Use the processed audio data in your PyTorch neural network or further analysis.

Competitor Comparisons

madmom

1,452

Python audio and music signal processing library

Pros of madmom

More comprehensive audio analysis toolkit with features like beat tracking, onset detection, and chord recognition
Established project with longer development history and wider adoption in MIR research
Supports both offline and real-time processing of audio signals

Cons of madmom

Less focus on GPU acceleration and deep learning integration
Steeper learning curve due to more complex architecture and broader feature set
Slower development pace with less frequent updates

Code Comparison

madmom example:

from madmom.features.beats import RNNBeatProcessor
processor = RNNBeatProcessor()
beats = processor(audio_file)

nnAudio example:

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spectrogram = spec_layer(audio_tensor)

madmom provides higher-level audio analysis functions, while nnAudio focuses on efficient spectrogram computation using neural network layers. nnAudio's approach allows for easier integration with deep learning pipelines and GPU acceleration.

librosa

7,578

Python library for audio and music analysis

Pros of librosa

Comprehensive audio processing library with a wide range of features
Well-established and widely used in the audio research community
Extensive documentation and examples available

Cons of librosa

CPU-based processing, which can be slower for large-scale operations
Not optimized for GPU acceleration or deep learning integration

Code Comparison

librosa:

import librosa
y, sr = librosa.load('audio.wav')
stft = librosa.stft(y)
mel_spec = librosa.feature.melspectrogram(y=y, sr=sr)

nnAudio:

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
mel_layer = Spectrogram.MelSpectrogram()
stft = spec_layer(audio_tensor)
mel_spec = mel_layer(audio_tensor)

Key Differences

nnAudio is designed for GPU acceleration and seamless integration with PyTorch
librosa offers a broader range of audio processing functions
nnAudio focuses on spectrogram generation as neural network layers
librosa provides more flexibility for customizing audio analysis parameters
nnAudio is better suited for deep learning workflows, while librosa excels in general audio processing tasks

Both libraries have their strengths, and the choice between them depends on the specific requirements of your audio processing project and whether GPU acceleration is necessary.

essentia

3,106

C++ library for audio and music analysis, description and synthesis, including Python bindings

Pros of essentia

Comprehensive library with a wide range of audio analysis algorithms
Supports multiple programming languages (C++, Python, JavaScript)
Well-documented and actively maintained by a research institution

Cons of essentia

Steeper learning curve due to its extensive feature set
Requires more setup and configuration compared to nnAudio
May be overkill for simple audio processing tasks

Code comparison

nnAudio (PyTorch-based):

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spec = spec_layer(audio_tensor)

essentia (Python bindings):

import essentia.standard as es
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
spectrogram = es.SpectrumCQ()

Summary

essentia is a comprehensive audio analysis library with support for multiple languages, making it suitable for complex audio processing tasks. However, it may have a steeper learning curve and require more setup than nnAudio. nnAudio, being PyTorch-based, offers a simpler interface for basic audio processing tasks, especially when working with neural networks. The choice between the two depends on the specific requirements of your project and your familiarity with audio processing concepts.

aubio

3,427

a library for audio and music analysis

Pros of aubio

Mature and well-established library with a long history of development
Supports a wide range of audio analysis tasks beyond spectrograms
Offers bindings for multiple programming languages

Cons of aubio

Less focused on neural network-based audio processing
May have a steeper learning curve for users primarily interested in spectrogram generation
Potentially slower for certain operations compared to GPU-accelerated alternatives

Code Comparison

aubio:

import aubio
win_s = 512
hop_s = win_s // 2
src = aubio.source('input.wav', samplerate=44100, hop_size=hop_s)
pv = aubio.pvoc(win_s, hop_s)
specgram = []

nnAudio:

import torch
from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT(n_fft=512, hop_length=256)
audio = torch.from_numpy(audio_array)
spectrogram = spec_layer(audio)

Both libraries offer spectrogram generation, but nnAudio is more focused on integrating with neural network workflows using PyTorch, while aubio provides a broader set of audio analysis tools. nnAudio may be more suitable for deep learning projects, while aubio excels in traditional audio processing tasks.

pedalboard

5,486

🎛 🔊 A Python library for audio.

Pros of pedalboard

Broader audio processing capabilities, including effects like reverb and distortion
Designed for real-time audio processing, suitable for live applications
Backed by Spotify, potentially offering better long-term support and updates

Cons of pedalboard

Less focused on neural network audio processing tasks
May have a steeper learning curve for users primarily interested in audio analysis

Code Comparison

nnAudio example:

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spec = spec_layer(audio_tensor)

pedalboard example:

from pedalboard import Pedalboard, Reverb, Distortion
board = Pedalboard([Reverb(), Distortion()])
effected = board(audio_array, sample_rate)

Summary

nnAudio focuses on audio analysis and spectrograms for neural networks, while pedalboard offers a wider range of audio processing tools. nnAudio may be more suitable for researchers working on audio-based machine learning tasks, whereas pedalboard is better suited for general audio manipulation and effects processing. The choice between the two depends on the specific requirements of the project and the user's familiarity with audio processing concepts.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

nnAudio

nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Full details of nnAudio can be found in our paper. You can use nnAudio for free, however, if you use this library, please cite the paper as per the reference provided below.

Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras. Other GPU audio processing tools are torchaudio and tf.signal. But they are not using a neural network approach, and hence the Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is still very difficult to install under the Windows environment due to sox. nnAudio is a more compatible audio processing tool across different operating systems since it relies mostly on PyTorch convolutional neural network. The name of nnAudio comes from torch.nn

Installation

pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation

pip install nnAudio==0.3.1

Documentation

https://kinwaicheuk.github.io/nnAudio/index.html

Comparison with other libraries

Feature	nnAudio	torch.stft	kapre	torchaudio	tf.signal	torch-stft	librosa
Trainable	â	â	â	â	â	â	â
Differentiable	â	â	â	â	â	â	â
Linear frequency STFT	â	â	â	â	â	â	â
Logarithmic frequency STFT	â	â	â	â	â	â	â
Inverse STFT	â	â	â	â	â	â	â
Griffin-Lim	â	â	â	â	â	â	â
Mel	â	â	â	â	â	â	â
MFCC	â	â	â	â	â	â	â
CQT	â	â	â	â	â	â	â
VQT	â	â	â	â	â	â	â
Gammatone	â	â	â	â	â	â	â
CFP¹	â	â	â	â	â	â	â
GPU support	â	â	â	â	â	â	â

â: Fully support âï¸: Developing (only available in dev version) â: Not support

¹ Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music

News & Changelog

To view the full changelog, please go to CHANGELOG.md

version 0.3.1 (24 Dec 2021):

Added VQT feature #113

version 0.3.0 (19 Nov 2021):

Changed module naming. nnAudio.Spectrogram will be replaced by nnAudio.features in the future releases. Currently, various spectrogram types are accessible via both methods.

Please cite nnAudio paper if you use it

The paper describing the release of nnAudio is available on IEEE Access

K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

BibTex

@ARTICLE{9174990,
  author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}},
  journal={IEEE Access}, 
  title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, 
  year={2020},
  volume={8},
  number={},
  pages={161981-162003},
  doi={10.1109/ACCESS.2020.3019084}}

Call for Contributions

nnAudio is a fast-growing package. With the increasing number of feature requests, we welcome anyone who is familiar with digital signal processing and neural network to contribute to nnAudio. The current list of pending features includes:

Invertible Constant Q Transform (CQT)

(Quick tips for unit test: cd inside Installation folder, then type pytest. You need at least 1931 MiB GPU memory to pass all the unit tests)

Alternatively, you may also contribute by:

Making a better demonstration code or tutorial

Dependencies

Numpy >= 1.14.5

Scipy >= 1.2.0

PyTorch >= 1.6.0 (Griffin-Lim only available after 1.6.0)

Python >= 3.6

librosa = 0.7.0 (Theoretically nnAudio depends on librosa. But we only need to use a single function mel from librosa.filters. To save users troubles from installing librosa for this single function, I just copy the chunk of functions corresponding to mel in my code so that nnAudio runs without the need to install librosa)

Other similar libraries

Kapre

torch-stft

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot