Convert Figma logo to code with AI

raidendotai logoopenv0

AI generated UI components

3,421
291
3,421
36

Top Related Projects

69,530

Robust Speech Recognition via Large-Scale Weak Supervision

Port of OpenAI's Whisper model in C/C++

11,304

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Faster Whisper transcription with CTranslate2

8,065

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

7,798

Code for the paper "Jukebox: A Generative Model for Music"

Quick Overview

OpenV0 is an open-source project aimed at creating a fully open and reproducible vision-language model. It combines a vision encoder and a language model to generate text descriptions of images. The project is in its early stages and seeks community contributions to improve and expand its capabilities.

Pros

  • Fully open-source and transparent, allowing for community-driven development and customization
  • Combines vision and language models for image-to-text generation
  • Provides a foundation for researchers and developers to build upon and experiment with vision-language models
  • Encourages collaboration and knowledge sharing in the AI community

Cons

  • Still in early development stages, with limited functionality compared to more established models
  • May require significant computational resources for training and inference
  • Documentation and examples are currently limited, potentially making it challenging for newcomers to get started
  • Performance and accuracy may not yet match that of proprietary or more mature vision-language models

Code Examples

Here are a few code examples demonstrating the usage of OpenV0:

  1. Loading the model and tokenizer:
from openv0 import OpenV0

model = OpenV0.from_pretrained("path/to/model")
tokenizer = OpenV0Tokenizer.from_pretrained("path/to/tokenizer")
  1. Generating a caption for an image:
from PIL import Image

image = Image.open("path/to/image.jpg")
caption = model.generate_caption(image, tokenizer)
print(caption)
  1. Fine-tuning the model on custom data:
from openv0 import OpenV0Trainer

trainer = OpenV0Trainer(model, tokenizer, train_dataset, eval_dataset)
trainer.train()

Getting Started

To get started with OpenV0, follow these steps:

  1. Install the library:
pip install openv0
  1. Download the pre-trained model and tokenizer:
from openv0 import OpenV0, OpenV0Tokenizer

model = OpenV0.from_pretrained("openv0/openv0-base")
tokenizer = OpenV0Tokenizer.from_pretrained("openv0/openv0-base")
  1. Generate a caption for an image:
from PIL import Image

image = Image.open("path/to/your/image.jpg")
caption = model.generate_caption(image, tokenizer)
print(caption)

Note: As the project is in its early stages, make sure to check the official repository for the most up-to-date installation and usage instructions.

Competitor Comparisons

69,530

Robust Speech Recognition via Large-Scale Weak Supervision

Pros of Whisper

  • More mature and widely adopted project with extensive documentation
  • Supports a broader range of languages and accents
  • Offers pre-trained models for immediate use

Cons of Whisper

  • Larger model size, requiring more computational resources
  • Less flexibility for customization and fine-tuning
  • Primarily focused on speech recognition, with limited additional features

Code Comparison

Whisper:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

OpenV0:

from openv0 import OpenV0

model = OpenV0()
transcription = model.transcribe("audio.mp3")
print(transcription)

Key Differences

  • Whisper is specifically designed for speech recognition, while OpenV0 aims to be a more general-purpose AI model
  • OpenV0 focuses on lightweight implementation and ease of use
  • Whisper offers multiple model sizes, whereas OpenV0 provides a single, compact model

Use Cases

  • Whisper: Ideal for production-ready speech recognition tasks across various languages
  • OpenV0: Suitable for developers seeking a simple, customizable AI model for diverse applications

Port of OpenAI's Whisper model in C/C++

Pros of whisper.cpp

  • Highly optimized C++ implementation, offering excellent performance
  • Supports various platforms and architectures, including mobile devices
  • Provides both command-line and library interfaces for flexibility

Cons of whisper.cpp

  • Limited to speech recognition tasks, while OpenV0 offers a broader range of AI capabilities
  • Requires more technical expertise to set up and use compared to OpenV0's user-friendly interface
  • Less focus on integration with other AI models or services

Code Comparison

whisper.cpp:

#include "whisper.h"

int main(int argc, char ** argv) {
    struct whisper_context * ctx = whisper_init_from_file("ggml-base.en.bin");
    whisper_full_default(ctx, wparams, pcmf32.data(), pcmf32.size());
    whisper_print_timings(ctx);
    whisper_free(ctx);
}

OpenV0:

from openv0 import OpenV0

client = OpenV0()
response = client.chat(
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response['choices'][0]['message']['content'])

The code snippets demonstrate the different focus areas of the two projects. whisper.cpp is specifically designed for speech recognition tasks, while OpenV0 provides a more general-purpose AI interface for various tasks, including natural language processing.

11,304

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Pros of WhisperX

  • Specialized in audio transcription and alignment, offering more advanced features for this specific task
  • Provides word-level timestamps and speaker diarization capabilities
  • Actively maintained with regular updates and improvements

Cons of WhisperX

  • Limited to audio processing tasks, lacking the broader AI capabilities of OpenV0
  • May require more computational resources for advanced features like speaker diarization
  • Less flexible for general-purpose AI applications compared to OpenV0's modular approach

Code Comparison

WhisperX:

import whisperx

model = whisperx.load_model("large-v2")
result = model.transcribe("audio.mp3")
result = whisperx.align(result["segments"], model, "audio.mp3")

OpenV0:

from openv0 import OpenV0

ai = OpenV0()
result = ai.run("Transcribe the audio file 'audio.mp3' and provide timestamps.")

While WhisperX offers more specialized audio processing capabilities, OpenV0 provides a more versatile and user-friendly interface for various AI tasks, including audio transcription. WhisperX may be preferred for projects requiring advanced audio analysis, while OpenV0 is better suited for general-purpose AI applications with its modular and extensible architecture.

Faster Whisper transcription with CTranslate2

Pros of faster-whisper

  • Optimized for speed, offering faster transcription performance
  • Supports multiple languages and provides language detection
  • Implements efficient CPU and GPU inference

Cons of faster-whisper

  • Focused solely on speech recognition, lacking broader AI capabilities
  • May require more setup and dependencies for optimal performance
  • Limited to audio input, not designed for multi-modal tasks

Code Comparison

faster-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

openv0:

from openv0 import OpenV0

agent = OpenV0()
response = agent.run("Describe the image and transcribe any speech in it", 
                     image="image.jpg", audio="audio.mp3")
print(response)

Summary

faster-whisper excels in speech recognition tasks, offering optimized performance and multi-language support. However, it's limited to audio processing. openv0, on the other hand, provides a more versatile AI agent capable of handling multiple modalities, including both image and audio inputs. While faster-whisper may offer superior speed for pure transcription tasks, openv0 provides a broader range of AI capabilities in a single package.

8,065

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

Pros of Whisper

  • Optimized for performance with GPU acceleration
  • Supports multiple languages and provides language detection
  • Offers both streaming and batch processing capabilities

Cons of Whisper

  • Limited to speech recognition and transcription tasks
  • Requires more setup and configuration compared to OpenV0
  • May have higher system requirements due to GPU optimization

Code Comparison

OpenV0:

from openv0 import OpenV0

client = OpenV0()
response = client.chat("Tell me a joke")
print(response)

Whisper:

#include "whisper.h"

whisper_context * ctx = whisper_init_from_file("model.bin");
whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
whisper_full(ctx, params, pcm, n_samples);

Summary

Whisper focuses on high-performance speech recognition with GPU acceleration, supporting multiple languages and offering both streaming and batch processing. OpenV0, on the other hand, provides a more general-purpose AI interface with a simpler setup process. While Whisper excels in speech-related tasks, OpenV0 offers broader functionality for various AI applications. The choice between the two depends on the specific requirements of the project and the desired balance between performance and ease of use.

7,798

Code for the paper "Jukebox: A Generative Model for Music"

Pros of Jukebox

  • More advanced and established project for AI music generation
  • Backed by OpenAI, with extensive research and documentation
  • Capable of generating high-quality, multi-instrumental music samples

Cons of Jukebox

  • Requires significant computational resources to run
  • Less focused on real-time generation or interactive use
  • More complex to set up and use for non-technical users

Code Comparison

Jukebox:

import jukebox
from jukebox.make_models import make_vqvae, make_prior
vqvae = make_vqvae(hps)
prior = make_prior(hps)

OpenV0:

from openv0 import OpenV0
model = OpenV0()
output = model.generate("Generate a happy melody")

Key Differences

  • Jukebox is specifically designed for music generation, while OpenV0 is a more general-purpose AI model
  • OpenV0 aims for easier integration and use in various applications
  • Jukebox offers more control over musical elements, while OpenV0 focuses on natural language prompts
  • OpenV0 is designed for real-time generation, whereas Jukebox typically requires more processing time

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

openv0

project website - openv0.com

openv0 is a generative UI component framework

It allows you to AI generate and iterate on UI components, with live preview.

  • openv0 makes use of open source component libraries and icons to build a library of assets for the generative pipeline.
  • openv0 is highly modular, and structured for elaborate generative processes
  • Component generation is a multipass pipeline - where every pass is a fully independent plugin

(say hi @n_raidenai 👋)

image


Currently Supported

  • Frontend frameworks
    • React
    • Next.js
    • Svelte
  • UI libraries
    • NextUI
    • Flowbite
    • Shadcn
  • Icons libraries
    • Lucide

The latest openv0 update makes it easier to integrate new frameworks, libraries and plugins.

Docs & guides on how to do so will be soon posted.

Next updates :

  • public explore+share web app on openv0.com (you can use the openv0 share API already)
  • multimodal UIray vision model (more details soon)
  • better validation passes, more integrations & plugins

Demos

Current version

https://github.com/raidendotai/openv0/assets/127366981/a249cf0d-ae44-4155-a5c1-fc2528bf05b5

Previous version

openv0_demo.webm


Install

  • Open your terminal and run
npx openv0@latest

It will download openv0, configure it based on your choices & install dependencies. Then :

  • Start the local server + webapp
    • start server cd server && node api.js
    • start webapp cd webapp && npm run dev
  • Open you web browser, go to http://localhost:5173/

That is all. Have fun 🎉


Alternatively - you can also clone this repo and install manually

To do so :

  • Clone repo, run npm i in server/
  • Unzip server/library/icons/lucide/vectordb/index.zip into that same folder
  • Configure your OpenAI key in server/.env
  • Web apps starter templates are in webapps-starters/
    • run npm i in the web app starter of your choice
    • make sure that WEBAPP_ROOT variable server/.env matches your webapp folder path
  • Start the server with node api.js and the web app with npm run dev

Try openv0

You can try openv0 (using React as a framework) with minimal configuration below

Replit

Run on Repl.it

StackBlitz

Run on StackBlitz


How It Works

Multipass Workflow

A simple explanation is the following image

openv0_process

Codebase

Youtube video by user @elie2222 explains parts of the previous openv0 code base

@elie2222