Convert Figma logo to code with AI

pytorch logoserve

Serve, optimize and scale PyTorch models in production

4,185
855
4,185
408

Top Related Projects

6,195

A flexible, high-performance serving system for machine learning models

7,069

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

8,019

Production infrastructure for machine learning at scale

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

8,463

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

18,503

Open source platform for the machine learning lifecycle

Quick Overview

TorchServe is an open-source model serving framework for PyTorch. It provides a flexible and easy-to-use solution for deploying and serving PyTorch models in production environments. TorchServe offers features like model versioning, metrics, and multi-model serving, making it suitable for various deployment scenarios.

Pros

  • Easy to use and deploy, with minimal configuration required
  • Supports both REST and gRPC APIs for model inference
  • Provides built-in model management and versioning capabilities
  • Offers customizable handlers for pre- and post-processing of inputs and outputs

Cons

  • Limited to PyTorch models, not suitable for other deep learning frameworks
  • May have performance overhead compared to more specialized serving solutions
  • Documentation can be sparse or outdated in some areas
  • Relatively new project, which may lead to potential stability issues or frequent changes

Code Examples

  1. Creating a custom handler:
from ts.torch_handler.base_handler import BaseHandler

class MyCustomHandler(BaseHandler):
    def preprocess(self, data):
        # Custom preprocessing logic
        return processed_data

    def inference(self, data):
        # Custom inference logic
        return predictions

    def postprocess(self, data):
        # Custom postprocessing logic
        return final_output
  1. Registering a model:
import torch
from torchserve.model_archiver.model_packaging import package_model

model = torch.load('my_model.pth')
package_model(model_name='my_model',
              version='1.0',
              model_file='my_model.py',
              serialized_file='my_model.pth',
              handler='my_custom_handler.py')
  1. Starting TorchServe:
torchserve --start --ncs --model-store model_store --models my_model.mar

Getting Started

  1. Install TorchServe:
pip install torchserve torch-model-archiver torch-workflow-archiver
  1. Create a model archive:
torch-model-archiver --model-name my_model --version 1.0 --model-file path/to/model.py --serialized-file path/to/model.pth --handler image_classifier
  1. Start TorchServe:
mkdir model_store
mv my_model.mar model_store/
torchserve --start --ncs --model-store model_store --models my_model.mar
  1. Make an inference request:
curl http://localhost:8080/predictions/my_model -T examples/image_classifier/kitten.jpg

Competitor Comparisons

6,195

A flexible, high-performance serving system for machine learning models

Pros of TensorFlow Serving

  • More mature and battle-tested in production environments
  • Supports model versioning and A/B testing out of the box
  • Offers high-performance serving with optimized C++ runtime

Cons of TensorFlow Serving

  • Limited to TensorFlow models only
  • Steeper learning curve and more complex setup
  • Less flexibility for custom preprocessing and postprocessing

Code Comparison

TensorFlow Serving (using gRPC):

import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'model'
request.inputs['input'].CopyFrom(tf.make_tensor_proto(data))
result = stub.Predict(request, 10.0)

PyTorch Serve:

import requests
import json

data = {'input': input_data.tolist()}
response = requests.post("http://localhost:8080/predictions/model", data=json.dumps(data))
result = response.json()

PyTorch Serve offers a simpler API and supports both REST and gRPC, while TensorFlow Serving focuses on high-performance gRPC communication. PyTorch Serve is more flexible and easier to use for custom models, but TensorFlow Serving excels in production environments with its optimized performance and built-in model management features.

7,069

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Pros of BentoML

  • More flexible model serving framework, supporting multiple ML frameworks beyond PyTorch
  • Provides a unified API for model packaging, deployment, and management
  • Offers built-in model versioning and experiment tracking capabilities

Cons of BentoML

  • Steeper learning curve due to its more comprehensive feature set
  • May have higher resource overhead for simpler serving scenarios
  • Less tightly integrated with PyTorch ecosystem

Code Comparison

BentoML:

import bentoml

@bentoml.env(pip_packages=["torch"])
@bentoml.artifacts([bentoml.PyTorchModelArtifact("model")])
class MyService(bentoml.BentoService):
    @bentoml.api(input=bentoml.Image(), output=bentoml.JsonOutput())
    def predict(self, image):
        return self.artifacts.model(image)

TorchServe:

import torch
from ts.torch_handler.base_handler import BaseHandler

class MyHandler(BaseHandler):
    def preprocess(self, data):
        return torch.tensor(data)

    def inference(self, data):
        return self.model.forward(data)

    def postprocess(self, data):
        return data.tolist()
8,019

Production infrastructure for machine learning at scale

Pros of Cortex

  • Supports multiple machine learning frameworks (TensorFlow, PyTorch, scikit-learn, etc.)
  • Provides automatic scaling and infrastructure management
  • Offers a more comprehensive end-to-end ML deployment solution

Cons of Cortex

  • Steeper learning curve due to more complex architecture
  • Less tightly integrated with PyTorch ecosystem
  • May have higher operational costs for smaller projects

Code Comparison

Cortex deployment:

- name: iris-classifier
  predictor:
    type: python
    path: predictor.py
  compute:
    cpu: 1

TorchServe deployment:

torch-model-archiver --model-name densenet161 --version 1.0 --model-file model.py --serialized-file densenet161-8d451a50.pth --export-path model_store --extra-files index_to_name.json --handler image_classifier

Summary

Cortex offers a more comprehensive solution for deploying machine learning models across various frameworks, with built-in scaling and infrastructure management. However, it may be more complex to set up and potentially costlier for smaller projects. TorchServe, on the other hand, provides a simpler, PyTorch-focused deployment option that might be more suitable for projects primarily using PyTorch models.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

  • Broader model support: Works with models from various frameworks, not just PyTorch
  • Performance optimizations: Offers advanced optimizations for inference across different hardware
  • Cross-platform compatibility: Supports a wide range of operating systems and devices

Cons of ONNX Runtime

  • Steeper learning curve: Requires more setup and configuration compared to TorchServe
  • Less integrated with PyTorch ecosystem: May require additional steps for PyTorch model deployment

Code Comparison

ONNX Runtime:

import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

TorchServe:

import torch
from torchserve.torch_handler.base_handler import BaseHandler

class ModelHandler(BaseHandler):
    def preprocess(self, data):
        # Preprocess input data
    def inference(self, data):
        # Perform inference
    def postprocess(self, data):
        # Postprocess output data

Both repositories offer robust solutions for model serving, with ONNX Runtime providing broader framework support and optimizations, while TorchServe offers tighter integration with the PyTorch ecosystem and simpler deployment for PyTorch models.

8,463

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Pros of Triton Inference Server

  • Supports multiple deep learning frameworks (TensorFlow, PyTorch, ONNX, etc.)
  • Offers advanced features like dynamic batching and model ensembling
  • Provides optimized performance for GPU inference

Cons of Triton Inference Server

  • Steeper learning curve and more complex setup
  • Less integrated with PyTorch ecosystem
  • May be overkill for simpler deployment scenarios

Code Comparison

Triton Inference Server (model configuration):

{
  "name": "mymodel",
  "backend": "pytorch",
  "max_batch_size": 8,
  "input": [{"name": "INPUT0", "data_type": "TYPE_FP32", "dims": [3, 224, 224]}],
  "output": [{"name": "OUTPUT0", "data_type": "TYPE_FP32", "dims": [1000]}]
}

TorchServe (model handler):

class MyHandler(BaseHandler):
    def preprocess(self, data):
        # Preprocess input data
    def inference(self, data):
        # Perform inference
    def postprocess(self, data):
        # Postprocess output data

Both servers offer powerful inference capabilities, but Triton Inference Server provides more flexibility across frameworks and advanced features, while TorchServe offers a simpler, more PyTorch-centric approach.

18,503

Open source platform for the machine learning lifecycle

Pros of MLflow

  • Broader scope: Supports the entire machine learning lifecycle, including experiment tracking, model packaging, and deployment
  • Language-agnostic: Works with various ML frameworks and languages, not limited to PyTorch
  • Extensive integrations: Offers integrations with popular data science tools and platforms

Cons of MLflow

  • Less specialized for serving: Not as focused on model serving capabilities as TorchServe
  • Steeper learning curve: May require more time to set up and configure due to its comprehensive feature set

Code Comparison

MLflow example:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.95)
mlflow.pytorch.log_model(model, "model")
mlflow.end_run()

TorchServe example:

import torch
from torch import nn

class MyModel(nn.Module):
    def forward(self, x):
        return x * 2

model = MyModel()
torch.save(model.state_dict(), "mymodel.pth")

Both repositories offer valuable tools for machine learning workflows, with MLflow providing a more comprehensive solution for the entire ML lifecycle, while TorchServe focuses specifically on serving PyTorch models efficiently.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

❗ANNOUNCEMENT: Security Changes❗

TorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information: Token Authorization, Model API control

TorchServe

Nightly build Docker Nightly build Benchmark Nightly Docker Regression Nightly KServe Regression Nightly Kubernetes Regression Nightly

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8

curl http://127.0.0.1:8080/predictions/bert -T input.txt

🚀 Quick start with TorchServe

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
pip install torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

🚀 Quick start with TorchServe (conda)

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
conda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver

Getting started guide

🐳 Quick Start with Docker

# Latest release
docker pull pytorch/torchserve

# Nightly build
docker pull pytorch/torchserve-nightly

Refer to torchserve docker for details.

🤖 Quick Start LLM Deployment

VLLM Engine

# Make sure to install torchserve with pip or conda as described above and login with `huggingface-cli login`
python -m ts.llm_launcher --model_id meta-llama/Llama-3.2-3B-Instruct --disable_token_auth

# Try it out
curl -X POST -d '{"model":"meta-llama/Llama-3.2-3B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"

TRT-LLM Engine

# Make sure to install torchserve with python venv as described above and login with `huggingface-cli login`
# pip install -U --use-deprecated=legacy-resolver -r requirements/trt_llm.txt
python -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3.1-8B-Instruct --engine trt_llm --disable_token_auth

# Try it out
curl -X POST -d '{"prompt":"count from 1 to 9 in french ", "max_tokens": 100}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"

🚢 Quick Start LLM Deployment with Docker

#export token=<HUGGINGFACE_HUB_TOKEN>
docker build --pull . -f docker/Dockerfile.vllm -t ts/vllm

docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth

# Try it out
curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"

Refer to LLM deployment for details and other methods.

⚡ Why TorchServe

🤔 How does TorchServe work

🏆 Highlighted Examples

For more examples

🛡️ TorchServe Security Policy

SECURITY.md

🤓 Learn More

https://pytorch.org/serve

🫂 Contributing

We welcome all contributions!

To learn more about how to contribute, see the contributor guide here.

📰 News

💖 All Contributors

Made with contrib.rocks.

⚖️ Disclaimer

This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to opensource@fb.com. For questions directed at Amazon, please send an email to torchserve@amazon.com. For all other questions, please open up an issue in this repository here.

TorchServe acknowledges the Multi Model Server (MMS) project from which it was derived