faster-whisper

Faster Whisper transcription with CTranslate2

17,373

1,424

17,373

271

View on GitHub

Top Related Projects

whisper

85,961

Robust Speech Recognition via Large-Scale Weak Supervision

whisper.cpp

41,097

Port of OpenAI's Whisper model in C/C++

whisperX

16,462

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

faster-whisper

17,373

Faster Whisper transcription with CTranslate2

Whisper

9,434

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

Faster-whisper is an optimized implementation of OpenAI's Whisper model for automatic speech recognition (ASR). It aims to provide faster inference times compared to the original implementation while maintaining accuracy. The project leverages CTranslate2, a fast inference engine for Transformer models, to achieve improved performance.

Pros

Significantly faster inference times compared to the original Whisper implementation
Supports both CPU and GPU acceleration
Maintains comparable accuracy to the original Whisper model
Offers various model sizes to balance between speed and accuracy

Cons

Requires additional dependencies (CTranslate2) compared to the original Whisper
May have slight differences in output compared to the original implementation
Limited to the functionalities provided by the Whisper model
Might require more setup and configuration for optimal performance

Code Examples

Basic transcription:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Transcription with language detection:

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

print(f"Detected language: {info.language} with probability {info.language_probability:.2f}")

Transcription with word-level timestamps:

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", word_timestamps=True)

for segment in segments:
    for word in segment.words:
        print(f"[{word.start:.2f}s -> {word.end:.2f}s] {word.word}")

Getting Started

Install faster-whisper:

pip install faster-whisper

Download a model and transcribe audio:

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3")

for segment in segments:
    print(segment.text)

Competitor Comparisons

whisper

85,961

Robust Speech Recognition via Large-Scale Weak Supervision

Pros of Whisper

Original implementation by OpenAI, ensuring authenticity and direct updates
Extensive documentation and community support
Wider range of pre-trained models available

Cons of Whisper

Slower inference speed, especially on CPU
Higher memory usage, which can be problematic on devices with limited resources
Less flexibility in model quantization options

Code Comparison

Whisper:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

Faster-Whisper:

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cpu", compute_type="int8")
segments, info = model.transcribe("audio.mp3")
for segment in segments:
    print(segment.text)

Faster-Whisper aims to improve upon Whisper by offering faster inference speeds and lower memory usage, particularly beneficial for CPU-based systems or devices with limited resources. It achieves this through optimizations like int8 quantization and the use of CTranslate2 backend. However, Whisper remains the original implementation with potentially more frequent updates and a larger community backing.

whisper.cpp

41,097

Port of OpenAI's Whisper model in C/C++

Pros of whisper.cpp

Lightweight and efficient C++ implementation, suitable for resource-constrained environments
Supports various quantization levels for model compression
Offers cross-platform compatibility, including mobile devices

Cons of whisper.cpp

Limited to CPU-only execution, potentially slower for large-scale tasks
Fewer built-in features compared to faster-whisper
May require more manual configuration and setup

Code Comparison

whisper.cpp:

#include "whisper.h"

int main() {
    struct whisper_context * ctx = whisper_init_from_file("ggml-base.en.bin");
    whisper_full_default(ctx, wparams, pcmf32.data(), pcmf32.size());
    whisper_free(ctx);
}

faster-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

The code snippets demonstrate the basic usage of each library. whisper.cpp uses a C++ interface with manual memory management, while faster-whisper provides a higher-level Python API with built-in features like transcription and segmentation.

whisperX

16,462

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Pros of WhisperX

Offers word-level timestamps and speaker diarization
Includes VAD (Voice Activity Detection) for improved accuracy
Supports batch processing for multiple audio files

Cons of WhisperX

May have slower processing speed compared to Faster-Whisper
Requires additional dependencies for diarization features
Less optimized for low-resource environments

Code Comparison

WhisperX:

import whisperx

model = whisperx.load_model("large-v2")
result = model.transcribe("audio.mp3")
result = whisperx.align(result["segments"], model, "audio.mp3", "en")

Faster-Whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

Both repositories provide efficient implementations of the Whisper model for speech recognition. WhisperX offers additional features like word-level timestamps and speaker diarization, which can be beneficial for more detailed analysis. However, Faster-Whisper focuses on optimizing speed and resource usage, making it potentially faster and more suitable for low-resource environments. The choice between the two depends on the specific requirements of the project, such as the need for detailed timestamps or processing speed.

faster-whisper

17,373

Faster Whisper transcription with CTranslate2

Pros of faster-whisper

Improved transcription speed compared to the original Whisper model
Optimized for efficient CPU and GPU usage
Supports various model sizes and languages

Cons of faster-whisper

May have slightly lower accuracy compared to the original Whisper model
Requires additional dependencies for optimal performance
Limited documentation and community support compared to more established projects

Code Comparison

faster-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Note: As the comparison is between the same repository (SYSTRAN/faster-whisper), there is no distinct code comparison to provide. The code snippet above demonstrates the basic usage of faster-whisper for transcription.

Summary

faster-whisper is an optimized implementation of OpenAI's Whisper model, focusing on improved speed and efficiency. While it offers faster transcription and better resource utilization, it may have slight trade-offs in accuracy and requires specific dependencies. The project is actively developed and aims to provide a more efficient alternative to the original Whisper model for various speech recognition tasks.

Whisper

9,434

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

Pros of Whisper

Utilizes DirectCompute for GPU acceleration, potentially offering better performance on Windows systems
Implements custom CUDA kernels for optimized processing
Provides a C++ API for integration into other applications

Cons of Whisper

Limited to Windows platforms, reducing cross-platform compatibility
May require more setup and configuration compared to faster-whisper
Less frequent updates and potentially smaller community support

Code Comparison

faster-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Whisper:

#include "whisper.h"

whisper_context * ctx = whisper_init_from_file("ggml-large.bin");
whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
whisper_full(ctx, params, pcmf32.data(), pcmf32.size());
whisper_print_timings(ctx);
whisper_free(ctx);

Both repositories aim to provide efficient implementations of the Whisper model, but they differ in their approach and target platforms. faster-whisper focuses on cross-platform compatibility and ease of use, while Whisper emphasizes Windows-specific optimizations and low-level control.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Broader scope: Supports a wide range of sequence-to-sequence tasks beyond speech recognition
More extensive documentation and examples
Larger community and more frequent updates

Cons of fairseq

Higher complexity and steeper learning curve
Potentially slower inference speed for specific tasks like speech recognition
Requires more setup and configuration for specialized use cases

Code Comparison

fairseq:

from fairseq.models.wav2vec import Wav2VecCtc

model = Wav2VecCtc.from_pretrained('path/to/model')
emissions = model.predict('audio.wav')

faster-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", beam_size=5)

The code snippets demonstrate that faster-whisper is more specialized for speech recognition tasks, offering a simpler API for transcription. fairseq, on the other hand, provides a more general-purpose approach that can be adapted to various sequence-to-sequence tasks but may require additional setup for specific use cases like speech recognition.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Faster Whisper transcription with CTranslate2

faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.

This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

Benchmark

Whisper

For reference, here's the time and memory usage that are required to transcribe 13 minutes of audio using different implementations:

Large-v2 model on GPU

Implementation	Precision	Beam size	Time	VRAM Usage
openai/whisper	fp16	5	2m23s	4708MB
whisper.cpp (Flash Attention)	fp16	5	1m05s	4127MB
transformers (SDPA)¹	fp16	5	1m52s	4960MB
faster-whisper	fp16	5	1m03s	4525MB
faster-whisper (`batch_size=8`)	fp16	5	17s	6090MB
faster-whisper	int8	5	59s	2926MB
faster-whisper (`batch_size=8`)	int8	5	16s	4500MB

distil-whisper-large-v3 model on GPU

Implementation	Precision	Beam size	Time	YT Commons WER
transformers (SDPA) (`batch_size=16`)	fp16	5	46m12s	14.801
faster-whisper (`batch_size=16`)	fp16	5	25m50s	13.527

GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.

Small model on CPU

Implementation	Precision	Beam size	Time	RAM Usage
openai/whisper	fp32	5	6m58s	2335MB
whisper.cpp	fp32	5	2m05s	1049MB
whisper.cpp (OpenVINO)	fp32	5	1m45s	1642MB
faster-whisper	fp32	5	2m37s	2257MB
faster-whisper (`batch_size=8`)	fp32	5	1m06s	4230MB
faster-whisper	int8	5	1m42s	1477MB
faster-whisper (`batch_size=8`)	int8	5	51s	3608MB

Executed with 8 threads on an Intel Core i7-12700K.

Requirements

Python 3.9 or greater

Unlike openai-whisper, FFmpeg does not need to be installed on the system. The audio is decoded with the Python library PyAV which bundles the FFmpeg libraries in its package.

GPU

GPU execution requires the following NVIDIA libraries to be installed:

Note: The latest versions of ctranslate2 only support CUDA 12 and cuDNN 9. For CUDA 11 and cuDNN 8, the current workaround is downgrading to the 3.24.0 version of ctranslate2, for CUDA 12 and cuDNN 8, downgrade to the 4.4.0 version of ctranslate2, (This can be done with pip install --force-reinstall ctranslate2==4.4.0 or specifying the version in a requirements.txt).

There are multiple ways to install the NVIDIA libraries mentioned above. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below.

Other installation methods (click to expand)

Note: For all these methods below, keep in mind the above note regarding CUDA versions. Depending on your setup, you may need to install the CUDA 11 versions of libraries that correspond to the CUDA 12 libraries listed in the instructions below.

Use Docker

The libraries (cuBLAS, cuDNN) are installed in this official NVIDIA CUDA Docker images: nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04.

Install with `pip` (Linux only)

On Linux these libraries can be installed with pip. Note that LD_LIBRARY_PATH must be set before launching Python.

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*

export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

Download the libraries from Purfview's repository (Windows & Linux)

Purfview's whisper-standalone-win provides the required NVIDIA libraries for Windows & Linux in a single archive. Decompress the archive and place the libraries in a directory included in the PATH.

Installation

The module can be installed from PyPI:

pip install faster-whisper

Other installation methods (click to expand)

Install the master branch

pip install --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/refs/heads/master.tar.gz"

Install a specific commit

pip install --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz"

Usage

Faster-whisper

from faster_whisper import WhisperModel

model_size = "large-v3"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Warning: segments is a generator so the transcription only starts when you iterate over it. The transcription can be run to completion by gathering the segments in a list or a for loop:

segments, _ = model.transcribe("audio.mp3")
segments = list(segments)  # The transcription will actually run here.

Batched Transcription

The following code snippet illustrates how to run batched transcription on an example audio file. BatchedInferencePipeline.transcribe is a drop-in replacement for WhisperModel.transcribe

from faster_whisper import WhisperModel, BatchedInferencePipeline

model = WhisperModel("turbo", device="cuda", compute_type="float16")
batched_model = BatchedInferencePipeline(model=model)
segments, info = batched_model.transcribe("audio.mp3", batch_size=16)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Faster Distil-Whisper

The Distil-Whisper checkpoints are compatible with the Faster-Whisper package. In particular, the latest distil-large-v3 checkpoint is intrinsically designed to work with the Faster-Whisper transcription algorithm. The following code snippet demonstrates how to run inference with distil-large-v3 on a specified audio file:

from faster_whisper import WhisperModel

model_size = "distil-large-v3"

model = WhisperModel(model_size, device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5, language="en", condition_on_previous_text=False)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

For more information about the distil-large-v3 model, refer to the original model card.

Word-level timestamps

segments, _ = model.transcribe("audio.mp3", word_timestamps=True)

for segment in segments:
    for word in segment.words:
        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))

VAD filter

The library integrates the Silero VAD model to filter out parts of the audio without speech:

segments, _ = model.transcribe("audio.mp3", vad_filter=True)

The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the source code. They can be customized with the dictionary argument vad_parameters:

segments, _ = model.transcribe(
    "audio.mp3",
    vad_filter=True,
    vad_parameters=dict(min_silence_duration_ms=500),
)

Vad filter is enabled by default for batched transcription.

Logging

The library logging level can be configured like this:

import logging

logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

Going further

See more model and transcription options in the WhisperModel class implementation.

Community integrations

Here is a non exhaustive list of open-source projects using faster-whisper. Feel free to add your project to the list!

speaches is an OpenAI compatible server using faster-whisper. It's easily deployable with Docker, works with OpenAI SDKs/CLI, supports streaming, and live transcription.
WhisperX is an award-winning Python library that offers speaker diarization and accurate word-level timestamps using wav2vec2 alignment
whisper-ctranslate2 is a command line client based on faster-whisper and compatible with the original client from openai/whisper.
whisper-diarize is a speaker diarization tool that is based on faster-whisper and NVIDIA NeMo.
whisper-standalone-win Standalone CLI executables of faster-whisper for Windows, Linux & macOS.
asr-sd-pipeline provides a scalable, modular, end to end multi-speaker speech to text solution implemented using AzureML pipelines.
Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using OpenAI-GPT.
wscribe is a flexible transcript generation tool supporting faster-whisper, it can export word level transcript and the exported transcript then can be edited with wscribe-editor
aTrain is a graphical user interface implementation of faster-whisper developed at the BANDAS-Center at the University of Graz for transcription and diarization in Windows (Windows Store App) and Linux.
Whisper-Streaming implements real-time mode for offline Whisper-like speech-to-text models with faster-whisper as the most recommended back-end. It implements a streaming policy with self-adaptive latency based on the actual source complexity, and demonstrates the state of the art.
WhisperLive is a nearly-live implementation of OpenAI's Whisper which uses faster-whisper as the backend to transcribe audio in real-time.
Faster-Whisper-Transcriber is a simple but reliable voice transcriber that provides a user-friendly interface.
Open-dubbing is open dubbing is an AI dubbing system which uses machine learning models to automatically translate and synchronize audio dialogue into different languages.

Model conversion

When loading a model from its size such as WhisperModel("large-v3"), the corresponding CTranslate2 model is automatically downloaded from the Hugging Face Hub.

We also provide a script to convert any Whisper models compatible with the Transformers library. They could be the original OpenAI models or user fine-tuned models.

For example the command below converts the original "large-v3" Whisper model and saves the weights in FP16:

pip install transformers[torch]>=4.23

ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2
--copy_files tokenizer.json preprocessor_config.json --quantization float16

The option --model accepts a model name on the Hub or a path to a model directory.
If the option --copy_files tokenizer.json is not used, the tokenizer configuration is automatically downloaded when the model is loaded later.

Models can also be converted from the code. See the conversion API.

Load a converted model

Directly load the model from a local directory:

model = faster_whisper.WhisperModel("whisper-large-v3-ct2")

Upload your model to the Hugging Face Hub and load it from its name:

model = faster_whisper.WhisperModel("username/whisper-large-v3-ct2")

Comparing performance against other implementations

If you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. In particular:

Verify that the same transcription options are used, especially the same beam size. For example in openai/whisper, model.transcribe uses a default beam size of 1 but here we use a default beam size of 5.
Transcription speed is closely affected by the number of words in the transcript, so ensure that other implementations have a similar WER (Word Error Rate) to this one.
When running on CPU, make sure to set the same number of threads. Many frameworks will read the environment variable OMP_NUM_THREADS, which can be set when running your script:

OMP_NUM_THREADS=4 python3 my_script.py

transformers OOM for any batch size > 1 ↩

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Whisper

Cons of Whisper

Code Comparison

Pros of whisper.cpp

Cons of whisper.cpp

Code Comparison

Pros of WhisperX

Cons of WhisperX

Code Comparison

Pros of faster-whisper

Cons of faster-whisper

Code Comparison

Summary

Pros of Whisper

Cons of Whisper

Code Comparison

Pros of fairseq

Cons of fairseq

Code Comparison

Convert designs to code with AI

README

Faster Whisper transcription with CTranslate2

Benchmark

Whisper

Large-v2 model on GPU

distil-whisper-large-v3 model on GPU

Small model on CPU

Requirements

GPU

Use Docker

Install with pip (Linux only)

Download the libraries from Purfview's repository (Windows & Linux)

Installation

Install the master branch

Install a specific commit

Usage

Faster-whisper

Batched Transcription

Faster Distil-Whisper

Word-level timestamps

VAD filter

Logging

Going further

Community integrations

Model conversion

Load a converted model

Comparing performance against other implementations

Footnotes

Top Related Projects

Convert designs to code with AI

Install with `pip` (Linux only)