Convert Figma logo to code with AI

KinWaiCheuk logonnAudio

Audio processing by using pytorch 1D convolution network

1,036
90
1,036
18

Top Related Projects

1,366

Python audio and music signal processing library

7,088

Python library for audio and music analysis

C++ library for audio and music analysis, description and synthesis, including Python bindings

3,291

a library for audio and music analysis

🎛 🔊 A Python library for audio.

Quick Overview

nnAudio is a PyTorch-based audio processing library that implements various audio processing methods as neural network layers. It allows for GPU acceleration of audio signal processing tasks, making it particularly useful for deep learning applications involving audio data.

Pros

  • GPU acceleration for faster audio processing
  • Seamless integration with PyTorch neural networks
  • Implements various audio transforms as neural network layers
  • Supports real-time audio processing

Cons

  • Requires PyTorch, which may not be suitable for all projects
  • Limited to GPU-enabled environments for optimal performance
  • May have a steeper learning curve compared to traditional audio processing libraries
  • Documentation could be more comprehensive for some advanced features

Code Examples

  1. Loading and using a Short-time Fourier Transform (STFT) layer:
import torch
from nnAudio import Spectrogram

# Create an STFT layer
stft = Spectrogram.STFT(n_fft=2048, hop_length=512)

# Generate a random audio signal
audio = torch.randn(1, 44100)

# Apply STFT
spectrogram = stft(audio)
  1. Creating a Mel spectrogram:
from nnAudio import Spectrogram

# Create a Mel spectrogram layer
mel_spec = Spectrogram.MelSpectrogram(sr=22050, n_fft=2048, n_mels=128)

# Generate a random audio signal
audio = torch.randn(1, 22050)

# Compute Mel spectrogram
mel_spectrogram = mel_spec(audio)
  1. Applying a Constant-Q Transform:
from nnAudio import Spectrogram

# Create a Constant-Q Transform layer
cqt = Spectrogram.CQT(sr=22050, hop_length=512, n_bins=84, bins_per_octave=12)

# Generate a random audio signal
audio = torch.randn(1, 22050)

# Compute CQT
cqt_spectrogram = cqt(audio)

Getting Started

To get started with nnAudio, follow these steps:

  1. Install nnAudio using pip:

    pip install nnAudio
    
  2. Import the necessary modules in your Python script:

    import torch
    from nnAudio import Spectrogram
    
  3. Create an audio processing layer and apply it to your audio data:

    # Create a Mel spectrogram layer
    mel_spec = Spectrogram.MelSpectrogram(sr=22050, n_fft=2048, n_mels=128)
    
    # Load or generate your audio data
    audio = torch.randn(1, 22050)  # Replace with your actual audio data
    
    # Compute Mel spectrogram
    mel_spectrogram = mel_spec(audio)
    
  4. Use the processed audio data in your PyTorch neural network or further analysis.

Competitor Comparisons

1,366

Python audio and music signal processing library

Pros of madmom

  • More comprehensive audio analysis toolkit with features like beat tracking, onset detection, and chord recognition
  • Established project with longer development history and wider adoption in MIR research
  • Supports both offline and real-time processing of audio signals

Cons of madmom

  • Less focus on GPU acceleration and deep learning integration
  • Steeper learning curve due to more complex architecture and broader feature set
  • Slower development pace with less frequent updates

Code Comparison

madmom example:

from madmom.features.beats import RNNBeatProcessor
processor = RNNBeatProcessor()
beats = processor(audio_file)

nnAudio example:

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spectrogram = spec_layer(audio_tensor)

madmom provides higher-level audio analysis functions, while nnAudio focuses on efficient spectrogram computation using neural network layers. nnAudio's approach allows for easier integration with deep learning pipelines and GPU acceleration.

7,088

Python library for audio and music analysis

Pros of librosa

  • Comprehensive audio processing library with a wide range of features
  • Well-established and widely used in the audio research community
  • Extensive documentation and examples available

Cons of librosa

  • CPU-based processing, which can be slower for large-scale operations
  • Not optimized for GPU acceleration or deep learning integration

Code Comparison

librosa:

import librosa
y, sr = librosa.load('audio.wav')
stft = librosa.stft(y)
mel_spec = librosa.feature.melspectrogram(y=y, sr=sr)

nnAudio:

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
mel_layer = Spectrogram.MelSpectrogram()
stft = spec_layer(audio_tensor)
mel_spec = mel_layer(audio_tensor)

Key Differences

  • nnAudio is designed for GPU acceleration and seamless integration with PyTorch
  • librosa offers a broader range of audio processing functions
  • nnAudio focuses on spectrogram generation as neural network layers
  • librosa provides more flexibility for customizing audio analysis parameters
  • nnAudio is better suited for deep learning workflows, while librosa excels in general audio processing tasks

Both libraries have their strengths, and the choice between them depends on the specific requirements of your audio processing project and whether GPU acceleration is necessary.

C++ library for audio and music analysis, description and synthesis, including Python bindings

Pros of essentia

  • Comprehensive library with a wide range of audio analysis algorithms
  • Supports multiple programming languages (C++, Python, JavaScript)
  • Well-documented and actively maintained by a research institution

Cons of essentia

  • Steeper learning curve due to its extensive feature set
  • Requires more setup and configuration compared to nnAudio
  • May be overkill for simple audio processing tasks

Code comparison

nnAudio (PyTorch-based):

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spec = spec_layer(audio_tensor)

essentia (Python bindings):

import essentia.standard as es
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
spectrogram = es.SpectrumCQ()

Summary

essentia is a comprehensive audio analysis library with support for multiple languages, making it suitable for complex audio processing tasks. However, it may have a steeper learning curve and require more setup than nnAudio. nnAudio, being PyTorch-based, offers a simpler interface for basic audio processing tasks, especially when working with neural networks. The choice between the two depends on the specific requirements of your project and your familiarity with audio processing concepts.

3,291

a library for audio and music analysis

Pros of aubio

  • Mature and well-established library with a long history of development
  • Supports a wide range of audio analysis tasks beyond spectrograms
  • Offers bindings for multiple programming languages

Cons of aubio

  • Less focused on neural network-based audio processing
  • May have a steeper learning curve for users primarily interested in spectrogram generation
  • Potentially slower for certain operations compared to GPU-accelerated alternatives

Code Comparison

aubio:

import aubio
win_s = 512
hop_s = win_s // 2
src = aubio.source('input.wav', samplerate=44100, hop_size=hop_s)
pv = aubio.pvoc(win_s, hop_s)
specgram = []

nnAudio:

import torch
from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT(n_fft=512, hop_length=256)
audio = torch.from_numpy(audio_array)
spectrogram = spec_layer(audio)

Both libraries offer spectrogram generation, but nnAudio is more focused on integrating with neural network workflows using PyTorch, while aubio provides a broader set of audio analysis tools. nnAudio may be more suitable for deep learning projects, while aubio excels in traditional audio processing tasks.

🎛 🔊 A Python library for audio.

Pros of pedalboard

  • Broader audio processing capabilities, including effects like reverb and distortion
  • Designed for real-time audio processing, suitable for live applications
  • Backed by Spotify, potentially offering better long-term support and updates

Cons of pedalboard

  • Less focused on neural network audio processing tasks
  • May have a steeper learning curve for users primarily interested in audio analysis

Code Comparison

nnAudio example:

from nnAudio import Spectrogram
spec_layer = Spectrogram.STFT()
spec = spec_layer(audio_tensor)

pedalboard example:

from pedalboard import Pedalboard, Reverb, Distortion
board = Pedalboard([Reverb(), Distortion()])
effected = board(audio_array, sample_rate)

Summary

nnAudio focuses on audio analysis and spectrograms for neural networks, while pedalboard offers a wider range of audio processing tools. nnAudio may be more suitable for researchers working on audio-based machine learning tasks, whereas pedalboard is better suited for general audio manipulation and effects processing. The choice between the two depends on the specific requirements of the project and the user's familiarity with audio processing concepts.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

nnAudio

nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.

Other GPU audio processing tools are torchaudio and tf.signal. But they are not using the neural network approach, and hence the Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is still very difficult to install under the Windows environment due to sox. nnAudio is a more compatible audio processing tool across different operating systems since it relies mostly on PyTorch convolutional neural network. The name of nnAudio comes from torch.nn

Installation

pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation

or

pip install nnAudio==0.3.1

Documentation

https://kinwaicheuk.github.io/nnAudio/index.html

Comparison with other libraries

FeaturennAudiotorch.stftkapretorchaudiotf.signaltorch-stftlibrosa
Trainable✅❌✅❌❌✅❌
Differentiable✅✅✅✅✅✅❌
Linear frequency STFT✅✅✅✅✅✅✅
Logarithmic frequency STFT✅❌✅❌❌❌❌
Inverse STFT✅✅✅✅✅✅✅
Griffin-Lim✅❌❌✅✅❌✅
Mel✅❌✅✅✅❌✅
MFCC✅❌❌✅✅❌✅
CQT✅❌❌❌❌❌✅
VQT✅❌❌❌❌❌✅
Gammatone✅❌❌❌❌❌❌
CFP1✅❌❌❌❌❌❌
GPU support✅✅✅✅✅✅❌

✅: Fully support ☑️: Developing (only available in dev version) ❌: Not support

1 Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music

News & Changelog

To view the full changelog, please go to CHANGELOG.md

version 0.3.1 (24 Dec 2021):

  1. Added VQT feature #113

version 0.3.0 (19 Nov 2021):

  1. Changed module naming. nnAudio.Spectrogram will be replaced by nnAudio.features in the future releases. Currently, various spectrogram types are accessible via both methods.

How to cite nnAudio

The paper for nnAudio is avaliable on IEEE Access

K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

BibTex

@ARTICLE{9174990, author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}}, journal={IEEE Access}, title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, year={2020}, volume={8}, number={}, pages={161981-162003}, doi={10.1109/ACCESS.2020.3019084}}

Call for Contributions

nnAudio is a fast-growing package. With the increasing number of feature requests, we welcome anyone who is familiar with digital signal processing and neural network to contribute to nnAudio. The current list of pending features includes:

  1. Invertible Constant Q Transform (CQT)

(Quick tips for unit test: cd inside Installation folder, then type pytest. You need at least 1931 MiB GPU memory to pass all the unit tests)

Alternatively, you may also contribute by:

  1. Making a better demonstration code or tutorial

Dependencies

Numpy >= 1.14.5

Scipy >= 1.2.0

PyTorch >= 1.6.0 (Griffin-Lim only available after 1.6.0)

Python >= 3.6

librosa = 0.7.0 (Theoretically nnAudio depends on librosa. But we only need to use a single function mel from librosa.filters. To save users troubles from installing librosa for this single function, I just copy the chunk of functions corresponding to mel in my code so that nnAudio runs without the need to install librosa)

Other similar libraries

Kapre

torch-stft