Convert Figma logo to code with AI

tyiannak logopyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

5,845
1,190
5,845
201

Top Related Projects

7,088

Python library for audio and music analysis

3,291

a library for audio and music analysis

C++ library for audio and music analysis, description and synthesis, including Python bindings

1,304

Python audio and music signal processing library

19,130

Magenta: Music and Art Generation with Machine Intelligence

Quick Overview

pyAudioAnalysis is a Python library for audio feature extraction, classification, segmentation, and application development. It provides a wide range of audio analysis functionalities, including speech recognition, music information retrieval, and audio event detection. The library is designed to be user-friendly and suitable for both researchers and developers working with audio data.

Pros

  • Comprehensive set of audio analysis tools and features
  • Easy-to-use API with well-documented functions
  • Supports various audio file formats and real-time audio processing
  • Includes pre-trained models for common audio classification tasks

Cons

  • Dependency on older versions of some libraries, which may cause compatibility issues
  • Limited support for deep learning-based audio analysis techniques
  • Some features may require additional setup or external dependencies
  • Documentation could be more extensive for advanced use cases

Code Examples

  1. Feature extraction from an audio file:
from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures

[Fs, x] = audioBasicIO.read_audio_file("sample.wav")
F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)
  1. Audio classification:
from pyAudioAnalysis import audioTrainTest as aT

aT.file_classification("unknown_audio.wav", "model_file", "svm")
  1. Audio segmentation:
from pyAudioAnalysis import audioSegmentation as aS

[flags, classes, segments] = aS.mid_term_file_classification("long_audio.wav", "model_file", "svm", True)

Getting Started

To get started with pyAudioAnalysis, follow these steps:

  1. Install the library using pip:

    pip install pyAudioAnalysis
    
  2. Import the necessary modules in your Python script:

    from pyAudioAnalysis import audioBasicIO
    from pyAudioAnalysis import ShortTermFeatures
    from pyAudioAnalysis import audioTrainTest as aT
    from pyAudioAnalysis import audioSegmentation as aS
    
  3. Use the library's functions to analyze your audio files or streams. For example, to extract features from an audio file:

    [Fs, x] = audioBasicIO.read_audio_file("your_audio_file.wav")
    F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)
    

For more detailed instructions and examples, refer to the library's documentation and GitHub repository.

Competitor Comparisons

7,088

Python library for audio and music analysis

Pros of librosa

  • More comprehensive and feature-rich library for music and audio analysis
  • Better documentation and community support
  • Actively maintained with frequent updates

Cons of librosa

  • Steeper learning curve for beginners
  • May be overkill for simple audio processing tasks
  • Slower performance for some operations compared to pyAudioAnalysis

Code Comparison

pyAudioAnalysis:

import audioBasicIO
import audioFeatureExtraction

[Fs, x] = audioBasicIO.read_audio_file("sample.wav")
F, f_names = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050*Fs, 0.025*Fs)

librosa:

import librosa

y, sr = librosa.load("sample.wav")
mfccs = librosa.feature.mfcc(y=y, sr=sr)
chroma = librosa.feature.chroma_stft(y=y, sr=sr)

Both libraries offer audio file reading and feature extraction capabilities. pyAudioAnalysis provides a more straightforward approach for basic audio analysis, while librosa offers more advanced and customizable features for music and audio processing.

3,291

a library for audio and music analysis

Pros of aubio

  • Written in C, offering better performance for low-level audio processing tasks
  • Provides a wider range of audio analysis algorithms, including pitch detection, onset detection, and beat tracking
  • Offers bindings for multiple programming languages, including Python, making it versatile for different development environments

Cons of aubio

  • Steeper learning curve due to its lower-level implementation and more complex API
  • Less focus on machine learning-based audio analysis compared to pyAudioAnalysis
  • Requires compilation and may have more complex setup process, especially on certain platforms

Code Comparison

pyAudioAnalysis example:

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import audioFeatureExtraction

[Fs, x] = audioBasicIO.read_audio_file("example.wav")
F, f_names = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050*Fs, 0.025*Fs)

aubio example:

import aubio

source = aubio.source("example.wav")
pitch_o = aubio.pitch("yin", samplerate=source.samplerate)

pitches = []
for frame in source:
    pitch = pitch_o(frame)[0]
    pitches.append(pitch)

C++ library for audio and music analysis, description and synthesis, including Python bindings

Pros of essentia

  • More comprehensive library with a wider range of audio analysis algorithms
  • Better performance and efficiency, especially for large-scale audio processing
  • Supports multiple programming languages (C++, Python, JavaScript)

Cons of essentia

  • Steeper learning curve due to its complexity and extensive feature set
  • Requires more setup and configuration compared to pyAudioAnalysis
  • Less focused on high-level audio analysis tasks

Code comparison

pyAudioAnalysis:

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import audioFeatureExtraction

[Fs, x] = audioBasicIO.read_audio_file("example.wav")
F, f_names = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050*Fs, 0.025*Fs)

essentia:

import essentia.standard as es

audio = es.MonoLoader(filename='example.wav')()
w = es.Windowing(type='hann')
spectrum = es.Spectrum()
mfcc = es.MFCC()

for frame in es.FrameGenerator(audio, frameSize=1024, hopSize=512):
    mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))

Both libraries offer audio analysis capabilities, but essentia provides more flexibility and advanced features at the cost of increased complexity. pyAudioAnalysis is more straightforward for basic audio analysis tasks, while essentia is better suited for more complex and large-scale audio processing projects.

1,304

Python audio and music signal processing library

Pros of madmom

  • More focused on music information retrieval (MIR) tasks
  • Offers deep learning-based algorithms for advanced audio analysis
  • Provides specialized tools for beat tracking and tempo estimation

Cons of madmom

  • Steeper learning curve due to its more specialized nature
  • Less comprehensive documentation compared to pyAudioAnalysis
  • Fewer general-purpose audio analysis features

Code Comparison

madmom example:

from madmom.features.beats import RNNBeatProcessor
processor = RNNBeatProcessor()
beats = processor(audio_file)

pyAudioAnalysis example:

from pyAudioAnalysis import audioBasicIO as aIO
from pyAudioAnalysis import ShortTermFeatures as aF
[Fs, x] = aIO.read_audio_file(audio_file)
F, f_names = aF.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)

madmom is more specialized for music-related tasks, offering advanced algorithms for beat detection and tempo estimation. It's particularly useful for MIR applications but may have a steeper learning curve.

pyAudioAnalysis provides a broader range of audio analysis features and is generally easier to use for beginners. It's more suitable for general-purpose audio analysis tasks but may lack some of the advanced music-specific functionalities offered by madmom.

Choose madmom for music-focused projects requiring advanced MIR capabilities, and pyAudioAnalysis for general audio analysis tasks or when ease of use is a priority.

19,130

Magenta: Music and Art Generation with Machine Intelligence

Pros of Magenta

  • Broader scope, covering music and art generation with machine learning
  • More active development and larger community support
  • Extensive documentation and tutorials for various use cases

Cons of Magenta

  • Steeper learning curve due to its complexity and breadth
  • Requires more computational resources for many tasks
  • Less focused on audio analysis compared to pyAudioAnalysis

Code Comparison

pyAudioAnalysis example:

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import audioFeatureExtraction
[Fs, x] = audioBasicIO.read_audio_file("example.wav")
F, f_names = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050*Fs, 0.025*Fs)

Magenta example:

import note_seq
from note_seq.protobuf import music_pb2

sequence = music_pb2.NoteSequence()
note = sequence.notes.add()
note.start_time = 0
note.end_time = 0.5
note.pitch = 60
note.velocity = 80

Both repositories offer powerful tools for audio and music processing, but they serve different purposes. pyAudioAnalysis focuses on audio feature extraction and classification, while Magenta provides a broader set of tools for music generation and creative applications using machine learning.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

A Python library for audio feature extraction, classification, segmentation and applications

This is general info. Click here for the complete wiki and here for a more generic intro to audio data handling

News

  • [2022-01-01] If you are not interested in training audio models from your own data, you can check the Deep Audio API, were you can directly send audio data and receive predictions with regards to the respective audio content (speech vs silence, musical genre, speaker gender, etc).
  • [2021-08-06] deep-audio-features deep audio classification and feature extraction using CNNs and Pytorch
  • Check out paura a Python script for realtime recording and analysis of audio data

General

pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:

  • Extract audio features and representations (e.g. mfccs, spectrogram, chromagram)
  • Train, parameter tune and evaluate classifiers of audio segments
  • Classify unknown sounds
  • Detect audio events and exclude silence periods from long recordings
  • Perform supervised segmentation (joint segmentation - classification)
  • Perform unsupervised segmentation (e.g. speaker diarization) and extract audio thumbnails
  • Train and use audio regression models (example application: emotion recognition)
  • Apply dimensionality reduction to visualize audio data and content similarities

Installation

  • Clone the source of this library: git clone https://github.com/tyiannak/pyAudioAnalysis.git
  • Install dependencies: pip install -r ./requirements.txt
  • Install using pip: pip install -e .

An audio classification example

More examples and detailed tutorials can be found at the wiki

pyAudioAnalysis provides easy-to-call wrappers to execute audio analysis tasks. Eg, this code first trains an audio segment classifier, given a set of WAV files stored in folders (each folder representing a different class) and then the trained classifier is used to classify an unknown audio WAV file

from pyAudioAnalysis import audioTrainTest as aT
aT.extract_features_and_train(["classifierData/music","classifierData/speech"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svmSMtemp", False)
aT.file_classification("data/doremi.wav", "svmSMtemp","svm")

Result: (0.0, array([ 0.90156761, 0.09843239]), ['music', 'speech'])

In addition, command-line support is provided for all functionalities. E.g. the following command extracts the spectrogram of an audio signal stored in a WAV file: python audioAnalysis.py fileSpectrogram -i data/doremi.wav

Further reading

Apart from this README file, to bettern understand how to use this library one should read the following:

@article{giannakopoulos2015pyaudioanalysis,
  title={pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis},
  author={Giannakopoulos, Theodoros},
  journal={PloS one},
  volume={10},
  number={12},
  year={2015},
  publisher={Public Library of Science}
}

For Matlab-related audio analysis material check this book.

Author

Theodoros Giannakopoulos, Principal Researcher of Multimodal Machine Learning at the Multimedia Analysis Group of the Computational Intelligence Lab (MagCIL) of the Institute of Informatics and Telecommunications, of the National Center for Scientific Research "Demokritos"