Convert Figma logo to code with AI

w-okada logovoice-changer

リアルタイムボイスチェンジャー Realtime Voice Changer

17,850
1,957
17,850
322

Top Related Projects

Clone a voice in 5 seconds to generate arbitrary speech in real-time

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

9,796

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

A python package to analyze and compare voices with deep learning

Quick Overview

The w-okada/voice-changer repository is a Python library that provides a set of tools for real-time voice conversion. It allows users to modify the pitch, timbre, and other characteristics of their voice during live audio input or pre-recorded audio files.

Pros

  • Real-time processing: The library supports real-time voice conversion, enabling live applications such as video conferencing or voice chat.
  • Flexible configuration: Users can customize the voice conversion parameters to achieve a wide range of voice transformations.
  • Cross-platform compatibility: The library can be used on various operating systems, including Windows, macOS, and Linux.
  • Open-source: The project is open-source, allowing for community contributions and further development.

Cons

  • Complexity: The library may have a steep learning curve for users unfamiliar with audio processing and machine learning concepts.
  • Performance limitations: Depending on the hardware and the complexity of the voice conversion, the library may not be able to achieve real-time performance on all devices.
  • Limited pre-trained models: The library currently provides a limited set of pre-trained voice conversion models, which may not cover all desired voice transformations.
  • Potential privacy concerns: Users should be aware of the potential privacy implications when using voice conversion technology, especially in live applications.

Code Examples

Here are a few code examples demonstrating the usage of the w-okada/voice-changer library:

  1. Real-time voice conversion:
from voice_changer import VoiceChanger

vc = VoiceChanger()
vc.start_stream()

while True:
    audio_frame = vc.get_audio_frame()
    converted_audio = vc.convert_voice(audio_frame)
    # Process the converted audio (e.g., play it, save it to a file)
  1. Batch voice conversion:
from voice_changer import VoiceChanger

vc = VoiceChanger()
vc.load_audio_file('input_audio.wav')
converted_audio = vc.convert_voice()
vc.save_audio_file('output_audio.wav')
  1. Adjusting voice conversion parameters:
from voice_changer import VoiceChanger

vc = VoiceChanger()
vc.set_pitch_shift(2.0)  # Shift the pitch up by 2 semitones
vc.set_timbre_shift(0.5)  # Shift the timbre by 0.5
converted_audio = vc.convert_voice()
  1. Using pre-trained voice conversion models:
from voice_changer import VoiceChanger

vc = VoiceChanger(model_name='male_to_female')
vc.load_audio_file('input_audio.wav')
converted_audio = vc.convert_voice()
vc.save_audio_file('output_audio.wav')

Getting Started

To get started with the w-okada/voice-changer library, follow these steps:

  1. Install the library using pip:
pip install voice-changer
  1. Import the VoiceChanger class and create an instance:
from voice_changer import VoiceChanger

vc = VoiceChanger()
  1. Load an audio file or start a real-time audio stream:
vc.load_audio_file('input_audio.wav')
# or
vc.start_stream()
  1. Convert the voice using the convert_voice() method:
converted_audio = vc.convert_voice()
  1. Save the converted audio to a file:
vc.save_audio_file('output_audio.wav')
  1. Adjust the voice conversion parameters as needed:
vc.set_pitch_shift(2.0)
vc.set_timbre_shift(0.5)
  1. Explore the available pre-trained voice conversion models:
vc = VoiceChanger(model_name='male_to_female')

Competitor Comparisons

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Pros of Real-Time-Voice-Cloning

  • More comprehensive voice cloning solution, including voice encoding and synthesis
  • Implements a pre-trained speaker verification model for enhanced accuracy
  • Provides a graphical user interface for easier interaction

Cons of Real-Time-Voice-Cloning

  • Less focus on real-time processing compared to voice-changer
  • May require more computational resources due to its comprehensive approach
  • Limited to English language support

Code Comparison

Real-Time-Voice-Cloning:

def load_model(weights_fpath):
    model = SpeakerEncoder()
    checkpoint = torch.load(weights_fpath)
    model.load_state_dict(checkpoint["model_state"])
    return model

voice-changer:

def get_audio_data(self):
    data = self.stream.read(self.chunk)
    return np.frombuffer(data, dtype=np.int16)

The Real-Time-Voice-Cloning code snippet shows model loading, while the voice-changer code focuses on real-time audio processing. This highlights the different priorities of each project, with Real-Time-Voice-Cloning emphasizing comprehensive voice cloning and voice-changer prioritizing real-time functionality.

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • Comprehensive sequence-to-sequence modeling toolkit with support for various tasks
  • Highly optimized and efficient implementation for large-scale training
  • Extensive documentation and active community support

Cons of fairseq

  • Steeper learning curve due to its complexity and wide range of features
  • Requires more computational resources for training and inference
  • Less focused on real-time voice conversion applications

Code comparison

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained('/path/to/model')
translated = model.translate('Hello world!')
print(translated)

voice-changer:

from voice_changer import VoiceChanger

vc = VoiceChanger(model_path='/path/to/model')
converted_audio = vc.convert('input_audio.wav', target_speaker='speaker_id')
vc.save_audio(converted_audio, 'output_audio.wav')

Key differences

  • fairseq is a general-purpose sequence-to-sequence toolkit, while voice-changer focuses specifically on voice conversion
  • fairseq offers more flexibility and customization options, but voice-changer provides a simpler API for voice conversion tasks
  • fairseq is better suited for research and large-scale applications, while voice-changer is more accessible for quick voice conversion projects
9,796

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Pros of TTS

  • Comprehensive text-to-speech toolkit with multiple models and languages
  • Well-documented and actively maintained by Mozilla
  • Supports both training and inference

Cons of TTS

  • Focused solely on text-to-speech, lacking voice conversion capabilities
  • May require more technical expertise to set up and use effectively
  • Larger project scope, potentially overwhelming for simple use cases

Code Comparison

TTS:

from TTS.api import TTS

tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC")
tts.tts_to_file(text="Hello world!", file_path="output.wav")

voice-changer:

from voice_changer import VoiceChanger

vc = VoiceChanger(model_path="path/to/model")
vc.convert("input.wav", "output.wav", target_speaker="speaker_id")

Key Differences

  • TTS focuses on generating speech from text, while voice-changer primarily handles voice conversion
  • TTS offers a wider range of pre-trained models and languages
  • voice-changer provides a simpler interface for voice conversion tasks
  • TTS is better suited for large-scale text-to-speech applications
  • voice-changer is more appropriate for real-time voice modification and conversion

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Pros of Tacotron2

  • Highly advanced text-to-speech synthesis model
  • Backed by NVIDIA, with extensive documentation and research
  • Supports fine-tuning for custom voices

Cons of Tacotron2

  • More complex to set up and use
  • Requires significant computational resources
  • Limited to text-to-speech, not real-time voice conversion

Code Comparison

Tacotron2 (PyTorch implementation):

from tacotron2.model import Tacotron2
from tacotron2.hparams import create_hparams

hparams = create_hparams()
model = Tacotron2(hparams)

Voice-changer (JavaScript implementation):

import { VoiceChanger } from 'voice-changer';

const voiceChanger = new VoiceChanger();
voiceChanger.initialize();

Key Differences

  • Tacotron2 focuses on text-to-speech synthesis, while Voice-changer is designed for real-time voice conversion
  • Tacotron2 is implemented in Python using PyTorch, whereas Voice-changer is primarily JavaScript-based
  • Tacotron2 requires more setup and computational resources, while Voice-changer is more lightweight and easier to integrate into web applications

Use Cases

  • Tacotron2: High-quality text-to-speech systems, custom voice synthesis for virtual assistants
  • Voice-changer: Real-time voice modification for gaming, streaming, or voice chat applications

A python package to analyze and compare voices with deep learning

Pros of Resemblyzer

  • Focused on speaker verification and embedding extraction
  • Lightweight and easy to integrate into other projects
  • Well-documented with clear examples and use cases

Cons of Resemblyzer

  • Limited to speaker recognition tasks, not full voice changing
  • Less active development and community support
  • Fewer features for real-time voice manipulation

Code Comparison

Resemblyzer:

from resemblyzer import VoiceEncoder, preprocess_wav
encoder = VoiceEncoder()
wav = preprocess_wav(wav_fpath)
embedding = encoder.embed_utterance(wav)

Voice Changer:

from voice_changer import VoiceChanger
vc = VoiceChanger()
changed_voice = vc.change_voice(input_audio, target_voice)

Key Differences

  • Resemblyzer is primarily for speaker recognition and embedding generation, while Voice Changer focuses on voice transformation and manipulation.
  • Voice Changer offers more comprehensive voice modification features, including pitch shifting and voice conversion.
  • Resemblyzer is better suited for tasks like speaker identification and verification, while Voice Changer is designed for voice alteration and synthesis.

Use Cases

Resemblyzer is ideal for:

  • Speaker verification systems
  • Voice-based authentication
  • Speaker clustering and diarization

Voice Changer is better for:

  • Voice conversion applications
  • Real-time voice modification
  • Creating voice effects and alterations

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

日本語 / 英語 / 韓国語/ 中国語/ ドイツ語/ アラビア語/ ギリシャ語/ スペイン語/ フランス語/ イタリア語/ ラテン語/ マレー語/ ロシア語 *日本語以外は機械翻訳です。

VCClient

VCClientは、AIを用いてリアルタイム音声変換を行うソフトウェアです。

What's New!

  • v.2.0.76-beta
    • new feature:
      • Beatrice: 話者マージの実装
      • Beatrice: オートピッチシフト
    • bugfix:
      • サーバモードのデバイス選択時の不具合対応
  • v.2.0.73-beta
    • new feature:
      • 編集したbeatrice modelのダウンロード
    • bugfix:
      • beatrice v2 のpitch, formantが反映されないバグを修正
      • Applio のembedderを使用しているモデルのONNXができないバグを修正

ダウンロードと関連リンク

Windows版、 M1 Mac版はhugging faceのリポジトリからダウンロードできます。

*1 Linuxはリポジトリをcloneしてお使いください。

関連リンク

関連ソフトウェア

VC Clientの特徴

多様なAIモデルをサポート

AIモデルv.2v.1ライセンス
RVC supportedsupportedリポジトリを参照してください。
Beatrice v1n/asupported (only win)独自
Beatrice v2supportedn/a独自
MMVCn/asupportedリポジトリを参照してください。
so-vits-svcn/asupportedリポジトリを参照してください。
DDSP-SVCn/asupportedリポジトリを参照してください。

スタンドアロン、ネットワーク経由の両構成をサポート

ローカルPCで完結した音声変換も、ネットワークを介した音声変換もサポートしています。 ネットワークを介した利用を行うことで、ゲームなどの高負荷なアプリケーションと同時に使用する場合に音声変換の負荷を外部にオフロードすることができます。

image

複数プラットフォームに対応

Windows, Mac(M1), Linux, Google Colab

*1 Linuxはリポジトリをcloneしてお使いください。

REST APIを提供

各種プログラミング言語でクライアントを作成することができます。

また、curlなどのOSに組み込まれているHTTPクライアントを使って操作ができます。

トラブルシュート

通信編

開発者の署名について

本ソフトウェアは開発元の署名しておりません。下記のように警告が出ますが、コントロールキーを押しながらアイコンをクリックすると実行できるようになります。これは Apple のセキュリティポリシーによるものです。実行は自己責任となります。

image

Acknowledgments

  本ソフトウェアの音声合成には、フリー素材キャラクター「つくよみちゃん」が無料公開している音声データを使用しています。
  ■つくよみちゃんコーパス(CV.夢前黎)
  https://tyc.rei-yumesaki.net/material/corpus/
  © Rei Yumesaki

利用規約

  • リアルタイムボイスチェンジャーつくよみちゃんについては、つくよみちゃんコーパスの利用規約に準じ、次の目的で変換後の音声を使用することを禁止します。

■人を批判・攻撃すること。(「批判・攻撃」の定義は、つくよみちゃんキャラクターライセンスに準じます)

■特定の政治的立場・宗教・思想への賛同または反対を呼びかけること。

■刺激の強い表現をゾーニングなしで公開すること。

■他者に対して二次利用(素材としての利用)を許可する形で公開すること。
※鑑賞用の作品として配布・販売していただくことは問題ございません。
  • リアルタイムボイスチェンジャーあみたろについては、あみたろの声素材工房様の次の利用規約に準じます。詳細はこちら
あみたろの声素材やコーパス読み上げ音声を使って音声モデルを作ったり、ボイスチェンジャーや声質変換などを使用して、自分の声をあみたろの声に変換して使うのもOKです。

ただしその場合は絶対に、あみたろ(もしくは小春音アミ)の声に声質変換していることを明記し、あみたろ(および小春音アミ)が話しているわけではないことが誰でもわかるようにしてください。
また、あみたろの声で話す内容は声素材の利用規約の範囲内のみとし、センシティブな発言などはしないでください。
  • リアルタイムボイスチェンジャー黄琴まひろについては、れぷりかどーるの利用規約に準じます。詳細はこちら

免責事項

本ソフトウェアの使用または使用不能により生じたいかなる直接損害・間接損害・波及的損害・結果的損害 または特別損害についても、一切責任を負いません。