voice-changer

リアルタイムボイスチェンジャー Realtime Voice Changer

17,850

1,957

17,850

322

View on GitHub

Top Related Projects

Real-Time-Voice-Cloning

54,087

Clone a voice in 5 seconds to generate arbitrary speech in real-time

fairseq

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

TTS

9,796

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

tacotron2

5,225

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Resemblyzer

2,883

A python package to analyze and compare voices with deep learning

Quick Overview

The w-okada/voice-changer repository is a Python library that provides a set of tools for real-time voice conversion. It allows users to modify the pitch, timbre, and other characteristics of their voice during live audio input or pre-recorded audio files.

Pros

Real-time processing: The library supports real-time voice conversion, enabling live applications such as video conferencing or voice chat.
Flexible configuration: Users can customize the voice conversion parameters to achieve a wide range of voice transformations.
Cross-platform compatibility: The library can be used on various operating systems, including Windows, macOS, and Linux.
Open-source: The project is open-source, allowing for community contributions and further development.

Cons

Complexity: The library may have a steep learning curve for users unfamiliar with audio processing and machine learning concepts.
Performance limitations: Depending on the hardware and the complexity of the voice conversion, the library may not be able to achieve real-time performance on all devices.
Limited pre-trained models: The library currently provides a limited set of pre-trained voice conversion models, which may not cover all desired voice transformations.
Potential privacy concerns: Users should be aware of the potential privacy implications when using voice conversion technology, especially in live applications.

Code Examples

Here are a few code examples demonstrating the usage of the w-okada/voice-changer library:

Real-time voice conversion:

from voice_changer import VoiceChanger

vc = VoiceChanger()
vc.start_stream()

while True:
    audio_frame = vc.get_audio_frame()
    converted_audio = vc.convert_voice(audio_frame)
    # Process the converted audio (e.g., play it, save it to a file)

Batch voice conversion:

from voice_changer import VoiceChanger

vc = VoiceChanger()
vc.load_audio_file('input_audio.wav')
converted_audio = vc.convert_voice()
vc.save_audio_file('output_audio.wav')

Adjusting voice conversion parameters:

from voice_changer import VoiceChanger

vc = VoiceChanger()
vc.set_pitch_shift(2.0)  # Shift the pitch up by 2 semitones
vc.set_timbre_shift(0.5)  # Shift the timbre by 0.5
converted_audio = vc.convert_voice()

Using pre-trained voice conversion models:

from voice_changer import VoiceChanger

vc = VoiceChanger(model_name='male_to_female')
vc.load_audio_file('input_audio.wav')
converted_audio = vc.convert_voice()
vc.save_audio_file('output_audio.wav')

Getting Started

To get started with the w-okada/voice-changer library, follow these steps:

Install the library using pip:

pip install voice-changer

Import the VoiceChanger class and create an instance:

from voice_changer import VoiceChanger

vc = VoiceChanger()

Load an audio file or start a real-time audio stream:

vc.load_audio_file('input_audio.wav')
# or
vc.start_stream()

Convert the voice using the convert_voice() method:

converted_audio = vc.convert_voice()

Save the converted audio to a file:

vc.save_audio_file('output_audio.wav')

Adjust the voice conversion parameters as needed:

vc.set_pitch_shift(2.0)
vc.set_timbre_shift(0.5)

Explore the available pre-trained voice conversion models:

vc = VoiceChanger(model_name='male_to_female')

Competitor Comparisons

Real-Time-Voice-Cloning

54,087

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Pros of Real-Time-Voice-Cloning

More comprehensive voice cloning solution, including voice encoding and synthesis
Implements a pre-trained speaker verification model for enhanced accuracy
Provides a graphical user interface for easier interaction

Cons of Real-Time-Voice-Cloning

Less focus on real-time processing compared to voice-changer
May require more computational resources due to its comprehensive approach
Limited to English language support

Code Comparison

Real-Time-Voice-Cloning:

def load_model(weights_fpath):
    model = SpeakerEncoder()
    checkpoint = torch.load(weights_fpath)
    model.load_state_dict(checkpoint["model_state"])
    return model

voice-changer:

def get_audio_data(self):
    data = self.stream.read(self.chunk)
    return np.frombuffer(data, dtype=np.int16)

The Real-Time-Voice-Cloning code snippet shows model loading, while the voice-changer code focuses on real-time audio processing. This highlights the different priorities of each project, with Real-Time-Voice-Cloning emphasizing comprehensive voice cloning and voice-changer prioritizing real-time functionality.

fairseq

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Comprehensive sequence-to-sequence modeling toolkit with support for various tasks
Highly optimized and efficient implementation for large-scale training
Extensive documentation and active community support

Cons of fairseq

Steeper learning curve due to its complexity and wide range of features
Requires more computational resources for training and inference
Less focused on real-time voice conversion applications

Code comparison

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained('/path/to/model')
translated = model.translate('Hello world!')
print(translated)

voice-changer:

from voice_changer import VoiceChanger

vc = VoiceChanger(model_path='/path/to/model')
converted_audio = vc.convert('input_audio.wav', target_speaker='speaker_id')
vc.save_audio(converted_audio, 'output_audio.wav')

Key differences

fairseq is a general-purpose sequence-to-sequence toolkit, while voice-changer focuses specifically on voice conversion
fairseq offers more flexibility and customization options, but voice-changer provides a simpler API for voice conversion tasks
fairseq is better suited for research and large-scale applications, while voice-changer is more accessible for quick voice conversion projects

TTS

9,796

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Pros of TTS

Comprehensive text-to-speech toolkit with multiple models and languages
Well-documented and actively maintained by Mozilla
Supports both training and inference

Cons of TTS

Focused solely on text-to-speech, lacking voice conversion capabilities
May require more technical expertise to set up and use effectively
Larger project scope, potentially overwhelming for simple use cases

Code Comparison

TTS:

from TTS.api import TTS

tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC")
tts.tts_to_file(text="Hello world!", file_path="output.wav")

voice-changer:

from voice_changer import VoiceChanger

vc = VoiceChanger(model_path="path/to/model")
vc.convert("input.wav", "output.wav", target_speaker="speaker_id")

Key Differences

TTS focuses on generating speech from text, while voice-changer primarily handles voice conversion
TTS offers a wider range of pre-trained models and languages
voice-changer provides a simpler interface for voice conversion tasks
TTS is better suited for large-scale text-to-speech applications
voice-changer is more appropriate for real-time voice modification and conversion

tacotron2

5,225

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Pros of Tacotron2

Highly advanced text-to-speech synthesis model
Backed by NVIDIA, with extensive documentation and research
Supports fine-tuning for custom voices

Cons of Tacotron2

More complex to set up and use
Requires significant computational resources
Limited to text-to-speech, not real-time voice conversion

Code Comparison

Tacotron2 (PyTorch implementation):

from tacotron2.model import Tacotron2
from tacotron2.hparams import create_hparams

hparams = create_hparams()
model = Tacotron2(hparams)

Voice-changer (JavaScript implementation):

import { VoiceChanger } from 'voice-changer';

const voiceChanger = new VoiceChanger();
voiceChanger.initialize();

Key Differences

Tacotron2 focuses on text-to-speech synthesis, while Voice-changer is designed for real-time voice conversion
Tacotron2 is implemented in Python using PyTorch, whereas Voice-changer is primarily JavaScript-based
Tacotron2 requires more setup and computational resources, while Voice-changer is more lightweight and easier to integrate into web applications

Use Cases

Tacotron2: High-quality text-to-speech systems, custom voice synthesis for virtual assistants
Voice-changer: Real-time voice modification for gaming, streaming, or voice chat applications

Resemblyzer

2,883

A python package to analyze and compare voices with deep learning

Pros of Resemblyzer

Focused on speaker verification and embedding extraction
Lightweight and easy to integrate into other projects
Well-documented with clear examples and use cases

Cons of Resemblyzer

Limited to speaker recognition tasks, not full voice changing
Less active development and community support
Fewer features for real-time voice manipulation

Code Comparison

Resemblyzer:

from resemblyzer import VoiceEncoder, preprocess_wav
encoder = VoiceEncoder()
wav = preprocess_wav(wav_fpath)
embedding = encoder.embed_utterance(wav)

Voice Changer:

from voice_changer import VoiceChanger
vc = VoiceChanger()
changed_voice = vc.change_voice(input_audio, target_voice)

Key Differences

Resemblyzer is primarily for speaker recognition and embedding generation, while Voice Changer focuses on voice transformation and manipulation.
Voice Changer offers more comprehensive voice modification features, including pitch shifting and voice conversion.
Resemblyzer is better suited for tasks like speaker identification and verification, while Voice Changer is designed for voice alteration and synthesis.

Use Cases

Resemblyzer is ideal for:

Speaker verification systems
Voice-based authentication
Speaker clustering and diarization

Voice Changer is better for:

Voice conversion applications
Real-time voice modification
Creating voice effects and alterations

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

VCClient

What's New!

v.2.0.76-beta
- new feature:
  - Beatrice: è©±èãã¼ã¸ã®å®è£
  - Beatrice: ãªã¼ããããã·ãã
- bugfix:
  - ãµã¼ãã¢ã¼ãã®ããã¤ã¹é¸ææã®ä¸å·åå¯¾å¿
v.2.0.73-beta
- new feature:
  - ç·¨éããbeatrice modelã®ãã¦ã³ãã¼ã
- bugfix:
  - beatrice v2 ã®pitch, formantãåæ ãããªããã°ãä¿®æ£
  - Applio ã®embedderãä½¿ç¨ãã¦ããã¢ãã«ã®ONNXãã§ããªããã°ãä¿®æ£

ãã¦ã³ãã¼ãã¨é¢é£ãªã³ã¯

Windowsçã M1 Macçã¯hugging faceã®ãªãã¸ããªãããã¦ã³ãã¼ãã§ãã¾ãã

*1 Linuxã¯ãªãã¸ããªãcloneãã¦ãä½¿ããã ããã

é¢é£ãªã³ã¯

é¢é£ã½ããã¦ã§ã¢

VC Clientã®ç¹å¾´

å¤æ§ãªAIã¢ãã«ããµãã¼ã

AIã¢ãã«	v.2	v.1	ã©ã¤ã»ã³ã¹
RVC	supported	supported	ãªãã¸ããªãåç§ãã¦ãã ããã
Beatrice v1	n/a	supported (only win)	ç¬èª
Beatrice v2	supported	n/a	ç¬èª
MMVC	n/a	supported	ãªãã¸ããªãåç§ãã¦ãã ããã
so-vits-svc	n/a	supported	ãªãã¸ããªãåç§ãã¦ãã ããã
DDSP-SVC	n/a	supported	ãªãã¸ããªãåç§ãã¦ãã ããã

ã¹ã¿ã³ãã¢ãã³ããããã¯ã¼ã¯çµç±ã®ä¸¡æ§æããµãã¼ã

Windows, Mac(M1), Linux, Google Colab

*1 Linuxã¯ãªãã¸ããªãcloneãã¦ãä½¿ããã ããã

REST APIãæä¾

ãã©ãã«ã·ã¥ã¼ã

éä¿¡ç·¨

éçºèã®ç½²åã«ã¤ãã¦

Acknowledgments

  æ¬ã½ããã¦ã§ã¢ã®é³å£°åæã«ã¯ãããªã¼ç´ æãã£ã©ã¯ã¿ã¼ãã¤ããã¿ã¡ããããç¡æå¬éãã¦ããé³å£°ãã¼ã¿ãä½¿ç¨ãã¦ãã¾ãã
  â ã¤ããã¿ã¡ããã³ã¼ãã¹ï¼CV.å¤¢åé»ï¼
  https://tyc.rei-yumesaki.net/material/corpus/
  Â© Rei Yumesaki

å©ç¨è¦ç´

ãªã¢ã«ã¿ã¤ã ãã¤ã¹ãã§ã³ã¸ã£ã¼ã¤ããã¿ã¡ããã«ã¤ãã¦ã¯ãã¤ããã¿ã¡ããã³ã¼ãã¹ã®å©ç¨è¦ç´ã«æºããæ¬¡ã®ç®çã§å¤æå¾ã®é³å£°ãä½¿ç¨ãããã¨ãç¦æ¢ãã¾ãã


â äººãæ¹å¤ã»æ»æãããã¨ãï¼ãæ¹å¤ã»æ»æãã®å®ç¾©ã¯ãã¤ããã¿ã¡ãããã£ã©ã¯ã¿ã¼ã©ã¤ã»ã³ã¹ã«æºãã¾ãï¼

â ç¹å®ã®æ¿æ²»çç«å ´ã»å®æã»ææ³ã¸ã®è³åã¾ãã¯åå¯¾ãå¼ã³ããããã¨ã

â åºæ¿ã®å¼·ãè¡¨ç¾ãã¾ã¼ãã³ã°ãªãã§å¬éãããã¨ã

â ä»èã«å¯¾ãã¦äºæ¬¡å©ç¨ï¼ç´ æã¨ãã¦ã®å©ç¨ï¼ãè¨±å¯ããå½¢ã§å¬éãããã¨ã
â»éè³ç¨ã®ä½åã¨ãã¦éå¸ã»è²©å£²ãã¦ããã ããã¨ã¯åé¡ãããã¾ããã

ãã¿ããã®å£°ç´ æãã³ã¼ãã¹èªã¿ä¸ãé³å£°ãä½¿ã£ã¦é³å£°ã¢ãã«ãä½ã£ããããã¤ã¹ãã§ã³ã¸ã£ã¼ãå£°è³ªå¤æãªã©ãä½¿ç¨ãã¦ãèªåã®å£°ããã¿ããã®å£°ã«å¤æãã¦ä½¿ãã®ãOKã§ãã

ãã ããã®å ´åã¯çµ¶å¯¾ã«ããã¿ããï¼ãããã¯å°æ¥é³ã¢ãï¼ã®å£°ã«å£°è³ªå¤æãã¦ãããã¨ãæè¨ãããã¿ããï¼ããã³å°æ¥é³ã¢ãï¼ãè©±ãã¦ããããã§ã¯ãªããã¨ãèª°ã§ããããããã«ãã¦ãã ããã
ã¾ãããã¿ããã®å£°ã§è©±ãåå®¹ã¯å£°ç´ æã®å©ç¨è¦ç´ã®ç¯å²åã®ã¿ã¨ããã»ã³ã·ãã£ããªçºè¨ãªã©ã¯ããªãã§ãã ããã

åè²¬äºé

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Real-Time-Voice-Cloning

Cons of Real-Time-Voice-Cloning

Code Comparison

Pros of fairseq

Cons of fairseq

Code comparison

Key differences

Pros of TTS

Cons of TTS

Code Comparison

Key Differences

Pros of Tacotron2

Cons of Tacotron2

Code Comparison

Key Differences

Use Cases

Pros of Resemblyzer

Cons of Resemblyzer

Code Comparison

Key Differences

Use Cases

Convert designs to code with AI

README

VCClient

What's New!

ãã¦ã³ã­ã¼ãã¨é¢é£ãªã³ã¯

é¢é£ãªã³ã¯

é¢é£ã½ããã¦ã§ã¢

VC Clientã®ç¹å¾´

å¤æ§ãªAIã¢ãã«ããµãã¼ã

ã¹ã¿ã³ãã¢ã­ã³ããããã¯ã¼ã¯çµç±ã®ä¸¡æ§æããµãã¼ã

è¤æ°ãã©ãããã©ã¼ã ã«å¯¾å¿

REST APIãæä¾

ãã©ãã«ã·ã¥ã¼ã

éçºè ã®ç½²åã«ã¤ãã¦

Acknowledgments

å©ç¨è¦ç´

å è²¬äºé

Top Related Projects

Convert designs to code with AI

ãã¦ã³ãã¼ãã¨é¢é£ãªã³ã¯

é¢é£ãªã³ã¯

é¢é£ã½ããã¦ã§ã¢

VC Clientã®ç¹å¾´

å¤æ§ãªAIã¢ãã«ããµãã¼ã

ã¹ã¿ã³ãã¢ãã³ããããã¯ã¼ã¯çµç±ã®ä¸¡æ§æããµãã¼ã

è¤æ°ãã©ãããã©ã¼ã ã«å¯¾å¿

REST APIãæä¾

ãã©ãã«ã·ã¥ã¼ã

éçºèã®ç½²åã«ã¤ãã¦

å©ç¨è¦ç´

åè²¬äºé