serve

☁️ Build multimodal AI applications with cloud-native stack

21,706

2,233

21,706

View on GitHub

Top Related Projects

openai-python

27,567

The official Python library for the OpenAI API

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

elasticsearch

73,408

Free and Open Source, Distributed, RESTful Search Engine

Quick Overview

Jina is an open-source neural search framework for building cross-modal and multi-modal applications powered by deep learning. It allows developers to build scalable and cloud-native neural search solutions that can handle various data types, including text, images, video, and audio.

Pros

Supports multi-modal and cross-modal search capabilities
Highly scalable and cloud-native architecture
Provides a rich ecosystem of pre-built executors and integrations
Easy to use with a pythonic API and comprehensive documentation

Cons

Steep learning curve for beginners in neural search
Limited community support compared to more established search frameworks
May be overkill for simple search use cases
Requires significant computational resources for large-scale deployments

Code Examples

Creating a simple text search flow:

from jina import Flow, Document

f = Flow().add(uses='jinahub://SimpleIndexer')

with f:
    f.post('/index', Document(text='Hello, World!'))
    response = f.post('/search', Document(text='Hello'))
    print(response[0].matches[0].text)

Building an image search pipeline:

from jina import Flow, Document

f = (
    Flow()
    .add(uses='jinahub://CLIPImageEncoder')
    .add(uses='jinahub://SimpleIndexer')
)

with f:
    f.index(Document(uri='path/to/image.jpg'))
    response = f.search(Document(uri='path/to/query_image.jpg'))
    print(response[0].matches[0].uri)

Creating a multi-modal search flow:

from jina import Flow, Document

f = (
    Flow()
    .add(uses='jinahub://CLIPTextEncoder', name='text_encoder')
    .add(uses='jinahub://CLIPImageEncoder', name='image_encoder')
    .add(uses='jinahub://SimpleIndexer')
)

with f:
    f.index([
        Document(text='A cute cat'),
        Document(uri='path/to/cat_image.jpg')
    ])
    response = f.search(Document(text='Find me a cat picture'))
    print(response[0].matches[0].uri)

Getting Started

To get started with Jina, follow these steps:

Install Jina:

pip install jina

Create a new Python file (e.g., app.py) and import Jina:

from jina import Flow, Document

Define a simple flow and run a search:

f = Flow().add(uses='jinahub://SimpleIndexer')

with f:
    f.post('/index', Document(text='Hello, Jina!'))
    response = f.post('/search', Document(text='Hello'))
    print(response[0].matches[0].text)

Run the script:

python app.py

For more advanced usage and configurations, refer to the official Jina documentation.

Competitor Comparisons

openai-python

27,567

The official Python library for the OpenAI API

Pros of openai-python

Focused specifically on OpenAI's API, providing a streamlined interface
Extensive documentation and examples for various OpenAI services
Lightweight and easy to integrate into existing projects

Cons of openai-python

Limited to OpenAI's services, lacking versatility for other AI tasks
Requires API key and potentially costly usage of OpenAI's resources
Less flexibility for custom AI model deployment and management

Code Comparison

openai-python:

import openai

openai.api_key = "your-api-key"
response = openai.Completion.create(engine="davinci", prompt="Hello, world!")
print(response.choices[0].text)

jina:

from jina import Flow, Document

f = Flow().add(uses='jinahub://CLIPTextEncoder')
with f:
    resp = f.post('/search', inputs=Document(text='Hello, world!'))
print(resp[0].matches)

The openai-python code focuses on text completion using OpenAI's API, while jina demonstrates a more flexible approach for creating AI workflows with various components. jina offers greater customization and scalability for complex AI tasks, but may require more setup and understanding of its ecosystem. openai-python provides a simpler interface for specific OpenAI services but is limited to their offerings.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Extensive library of pre-trained models for various NLP tasks
Well-documented and widely adopted in the research community
Seamless integration with PyTorch and TensorFlow

Cons of Transformers

Focused primarily on NLP tasks, less versatile for other AI domains
Can be resource-intensive for large models and datasets
Steeper learning curve for beginners in machine learning

Code Comparison

Transformers:

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")[0]
print(f"Label: {result['label']}, Score: {result['score']:.4f}")

Jina:

from jina import Flow, Document

f = Flow().add(uses='jinahub://SimpleIndexer')
with f:
    resp = f.post('/index', Document(text='I love this product!'))
print(f"Indexed document: {resp[0].id}")

Key Differences

Transformers focuses on NLP tasks, while Jina is a more general-purpose neural search framework
Jina offers a microservice architecture for scalable AI applications, whereas Transformers is primarily a model library
Transformers provides easy access to pre-trained models, while Jina emphasizes building end-to-end search solutions

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Widely adopted in the machine learning community with extensive ecosystem
Powerful and flexible for deep learning research and production
Excellent GPU acceleration and distributed training capabilities

Cons of PyTorch

Steeper learning curve for beginners compared to Jina
Primarily focused on deep learning, less versatile for general AI applications
Requires more boilerplate code for certain tasks

Code Comparison

PyTorch example (basic neural network):

import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc = nn.Linear(10, 5)
    
    def forward(self, x):
        return self.fc(x)

Jina example (basic Flow):

from jina import Flow, Document

f = Flow().add(uses='jinahub://SimpleIndexer')
with f:
    f.post('/index', Document(text='Hello, World!'))

PyTorch is a powerful deep learning framework, while Jina is a cloud-native neural search framework. PyTorch offers more flexibility for custom neural network architectures, whereas Jina provides higher-level abstractions for building search and AI applications. PyTorch requires more low-level coding, while Jina emphasizes simplicity and rapid development for specific use cases.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Extensive ecosystem with robust tools and libraries
Highly optimized for large-scale machine learning and deep learning
Strong support for distributed computing and GPU acceleration

Cons of TensorFlow

Steeper learning curve, especially for beginners
Can be overkill for simpler machine learning tasks
Slower development cycle compared to more lightweight frameworks

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

Jina:

from jina import Flow, Document

f = Flow().add(uses='jinahub://SimpleIndexer')
with f:
    f.post('/index', Document(text='Hello, World!'))

TensorFlow is a comprehensive machine learning framework, while Jina is a neural search framework. TensorFlow excels in building and training complex neural networks, whereas Jina focuses on creating scalable neural search solutions. TensorFlow's code typically involves defining and training models, while Jina's code centers around creating flows and processing documents for search and retrieval tasks.

haystack

21,304

Pros of Haystack

More focused on question answering and document retrieval tasks
Offers a wider range of pre-built pipelines for specific NLP tasks
Better documentation and tutorials for beginners

Cons of Haystack

Less flexible for general-purpose AI applications
Smaller community and fewer contributors compared to Jina

Code Comparison

Haystack example:

from haystack import Pipeline
from haystack.nodes import TfidfRetriever, FARMReader

pipeline = Pipeline()
pipeline.add_node(component=TfidfRetriever(document_store=document_store), name="Retriever", inputs=["Query"])
pipeline.add_node(component=FARMReader(model_name_or_path="deepset/roberta-base-squad2"), name="Reader", inputs=["Retriever"])

Jina example:

from jina import Flow, Document

f = Flow().add(uses='jinahub://SimpleIndexer')
with f:
    f.post('/index', Document(text='Hello, World!'))
    f.post('/search', Document(text='Hello'))

Both frameworks offer easy-to-use pipelines for various NLP tasks, but Haystack is more specialized for question answering and document retrieval, while Jina provides a more flexible architecture for general AI applications. Haystack's code tends to be more explicit in defining pipeline components, whereas Jina's approach is more concise and modular.

elasticsearch

73,408

Free and Open Source, Distributed, RESTful Search Engine

Pros of Elasticsearch

Mature and battle-tested search engine with extensive documentation
Powerful full-text search capabilities and advanced querying options
Large ecosystem with numerous plugins and integrations

Cons of Elasticsearch

Steep learning curve and complex configuration
Resource-intensive, especially for large-scale deployments
Primarily focused on text-based search, less versatile for multimodal data

Code Comparison

Elasticsearch query example:

{
  "query": {
    "match": {
      "title": "search example"
    }
  }
}

Jina query example:

from jina import Client, Document

c = Client()
d = Document(text='search example')
results = c.search(d)

Key Differences

Jina is designed for multimodal and cross-modal search, while Elasticsearch excels in text-based search
Elasticsearch uses a RESTful API with JSON queries, whereas Jina uses a Python-native API
Jina focuses on neural search and deep learning models, while Elasticsearch relies more on traditional information retrieval techniques

Both projects have their strengths, with Elasticsearch being a robust choice for text-based search and Jina offering more flexibility for multimodal and AI-powered search applications.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Jina-Serve

Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.

Key Features

Native support for all major ML frameworks and data types
High-performance service design with scaling, streaming, and dynamic batching
LLM serving with streaming output
Built-in Docker integration and Executor Hub
One-click deployment to Jina AI Cloud
Enterprise-ready with Kubernetes and Docker Compose support

Comparison with FastAPI

Key advantages over FastAPI:

DocArray-based data handling with native gRPC support
Built-in containerization and service orchestration
Seamless scaling of microservices
One-command cloud deployment

Install

pip install jina

See guides for Apple Silicon and Windows.

Core Concepts

Three main layers:

Data: BaseDoc and DocList for input/output
Serving: Executors process Documents, Gateway connects services
Orchestration: Deployments serve Executors, Flows create pipelines

Build AI Services

Let's create a gRPC-based AI service using StableLM:

from jina import Executor, requests
from docarray import DocList, BaseDoc
from transformers import pipeline


class Prompt(BaseDoc):
    text: str


class Generation(BaseDoc):
    prompt: str
    text: str


class StableLM(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.generator = pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )

    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        generations = DocList[Generation]()
        prompts = docs.text
        llm_outputs = self.generator(prompts)
        for prompt, output in zip(prompts, llm_outputs):
            generations.append(Generation(prompt=prompt, text=output))
        return generations

Deploy with Python or YAML:

from jina import Deployment
from executor import StableLM

dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)

with dep:
    dep.block()

jtype: Deployment
with:
 uses: StableLM
 py_modules:
   - executor.py
 timeout_ready: -1
 port: 12345

Use the client:

from jina import Client
from docarray import DocList
from executor import Prompt, Generation

prompt = Prompt(text='suggest an interesting image generation prompt')
client = Client(port=12345)
response = client.post('/', inputs=[prompt], return_type=DocList[Generation])

Build Pipelines

Chain services into a Flow:

from jina import Flow

flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)

with flow:
    flow.block()

Scaling and Deployment

Local Scaling

Boost throughput with built-in features:

Replicas for parallel processing
Shards for data partitioning
Dynamic batching for efficient model inference

Example scaling a Stable Diffusion deployment:

jtype: Deployment
with:
 uses: TextToImage
 timeout_ready: -1
 py_modules:
   - text_to_image.py
 env:
  CUDA_VISIBLE_DEVICES: RR
 replicas: 2
 uses_dynamic_batching:
   /default:
     preferred_batch_size: 10
     timeout: 200

Cloud Deployment

Containerize Services

Structure your Executor:

TextToImage/
âââ executor.py
âââ config.yml
âââ requirements.txt

Configure:

# config.yml
jtype: TextToImage
py_modules:
 - executor.py
metas:
 name: TextToImage
 description: Text to Image generation Executor

Push to Hub:

jina hub push TextToImage

Deploy to Kubernetes

jina export kubernetes flow.yml ./my-k8s
kubectl apply -R -f my-k8s

Use Docker Compose

jina export docker-compose flow.yml docker-compose.yml
docker-compose up

JCloud Deployment

Deploy with a single command:

jina cloud deploy jcloud-flow.yml

LLM Streaming

Enable token-by-token streaming for responsive LLM applications:

Define schemas:

from docarray import BaseDoc


class PromptDocument(BaseDoc):
    prompt: str
    max_tokens: int


class ModelOutputDocument(BaseDoc):
    token_id: int
    generated_text: str

Initialize service:

from transformers import GPT2Tokenizer, GPT2LMHeadModel


class TokenStreamingExecutor(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = GPT2LMHeadModel.from_pretrained('gpt2')

Implement streaming:

@requests(on='/stream')
async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:
    input = tokenizer(doc.prompt, return_tensors='pt')
    input_len = input['input_ids'].shape[1]
    for _ in range(doc.max_tokens):
        output = self.model.generate(**input, max_new_tokens=1)
        if output[0][-1] == tokenizer.eos_token_id:
            break
        yield ModelOutputDocument(
            token_id=output[0][-1],
            generated_text=tokenizer.decode(
                output[0][input_len:], skip_special_tokens=True
            ),
        )
        input = {
            'input_ids': output,
            'attention_mask': torch.ones(1, len(output[0])),
        }

Serve and use:

# Server
with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:
    dep.block()


# Client
async def main():
    client = Client(port=12345, protocol='grpc', asyncio=True)
    async for doc in client.stream_doc(
        on='/stream',
        inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),
        return_type=ModelOutputDocument,
    ):
        print(doc.generated_text)

Support

Jina-serve is backed by Jina AI and licensed under Apache-2.0.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot