Convert Figma logo to code with AI

tensorflow logoserving

A flexible, high-performance serving system for machine learning models

6,158
2,190
6,158
102

Top Related Projects

27,075

Cross-platform, customizable ML solutions for live and streaming media.

17,765

Open standard for machine learning interoperability

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

4,185

Serve, optimize and scale PyTorch models in production

8,010

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

7,069

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Quick Overview

TensorFlow Serving is an open-source serving system for machine learning models, designed to take models from experimentation to production environments. It allows for easy deployment of algorithms and experiments while maintaining the same server architecture and APIs. TensorFlow Serving is particularly well-suited for TensorFlow models but can be extended to serve other types of models and data.

Pros

  • Flexible architecture that supports multiple machine learning frameworks
  • Efficient model versioning and concurrent model serving
  • High performance, supporting batching and GPU acceleration
  • Easy integration with TensorFlow and other ML ecosystems

Cons

  • Steep learning curve for beginners
  • Limited support for non-TensorFlow models out of the box
  • Can be resource-intensive for large-scale deployments
  • Documentation can be sparse for advanced use cases

Code Examples

  1. Loading and serving a saved model:
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

model = tf.saved_model.load('/path/to/saved_model')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = 'serving_default'
request.inputs['input'].CopyFrom(tf.make_tensor_proto(input_data))

result = stub.Predict(request, 10.0)  # 10 secs timeout
  1. Creating a simple model server:
import tensorflow as tf
import tensorflow_serving as tf_serving

model = tf.keras.Sequential([...])  # Define your model
model.compile(...)
model.fit(...)

tf_serving.Model(model).save('/path/to/save')

server = tf_serving.Server()
server.add_model('my_model', '/path/to/save')
server.start()
  1. Making predictions using REST API:
import requests
import json

data = json.dumps({"signature_name": "serving_default", "instances": [1.0, 2.0, 3.0]})
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/my_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']

Getting Started

  1. Install TensorFlow Serving:
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install tensorflow-model-server
  1. Serve a model:
tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=my_model --model_base_path=/path/to/my_model
  1. Make predictions using the gRPC API:
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import grpc

channel = grpc.insecure_channel('localhost:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = 'serving_default'
request.inputs['input'].CopyFrom(tf.make_tensor_proto(input_data))
result = stub.Predict(request, 10.0)

Competitor Comparisons

27,075

Cross-platform, customizable ML solutions for live and streaming media.

Pros of MediaPipe

  • More versatile, supporting various tasks beyond model serving (e.g., audio, video processing)
  • Cross-platform support (mobile, web, desktop)
  • Easier to integrate into end-user applications

Cons of MediaPipe

  • Less focused on high-performance model serving
  • May require more setup and configuration for specific use cases
  • Potentially steeper learning curve due to broader feature set

Code Comparison

MediaPipe (graph-based pipeline):

import mediapipe as mp

mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
results = hands.process(image)

TensorFlow Serving (REST API request):

import requests

data = {"instances": [image.tolist()]}
response = requests.post(url, json=data)
predictions = response.json()["predictions"]

MediaPipe offers a more integrated approach for specific tasks, while TensorFlow Serving focuses on efficient model deployment and scaling. MediaPipe is better suited for end-to-end applications, especially on mobile and web platforms, while TensorFlow Serving excels in high-performance model serving for production environments.

17,765

Open standard for machine learning interoperability

Pros of ONNX

  • Framework-agnostic: Supports multiple ML frameworks, not limited to TensorFlow
  • Broader ecosystem: Wide range of tools and libraries for model conversion and optimization
  • Lightweight: Focused on model representation, not tied to a specific serving infrastructure

Cons of ONNX

  • Less integrated: Requires additional components for deployment and serving
  • Limited built-in serving capabilities: Primarily a model format, not a complete serving solution
  • Steeper learning curve: May require more setup and configuration for deployment

Code Comparison

ONNX model definition:

import onnx

node = onnx.helper.make_node("Relu", inputs=["x"], outputs=["y"])
graph = onnx.helper.make_graph([node], "test-model", [input], [output])
model = onnx.helper.make_model(graph)

TensorFlow Serving model definition:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu', input_shape=(5,)),
    tf.keras.layers.Dense(1)
])
tf.saved_model.save(model, "saved_model_dir")

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

  • Supports multiple frameworks (TensorFlow, PyTorch, etc.) through ONNX format
  • Offers better performance optimization across different hardware
  • Provides a more flexible deployment option for various platforms

Cons of ONNX Runtime

  • May require additional steps to convert models to ONNX format
  • Less mature ecosystem compared to TensorFlow Serving
  • Potentially more complex setup for certain use cases

Code Comparison

ONNX Runtime:

import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

TensorFlow Serving:

import tensorflow as tf
model = tf.saved_model.load("saved_model_dir")
output = model.signatures["serving_default"](tf.constant(input_data))

Both ONNX Runtime and TensorFlow Serving aim to provide efficient model serving solutions. ONNX Runtime offers broader framework support and potentially better cross-platform optimization, while TensorFlow Serving provides a more streamlined experience for TensorFlow models with a mature ecosystem. The choice between them depends on specific project requirements and the frameworks used in model development.

4,185

Serve, optimize and scale PyTorch models in production

Pros of Serve

  • More flexible and customizable serving architecture
  • Easier integration with PyTorch ecosystem and models
  • Supports multi-model serving out of the box

Cons of Serve

  • Less mature and battle-tested in production environments
  • Smaller community and ecosystem compared to TensorFlow Serving
  • Limited support for non-PyTorch models

Code Comparison

Serve:

import torch
from torchvision import models
model = models.resnet18(pretrained=True)
torch.save(model.state_dict(), "resnet18.pth")

TensorFlow Serving:

import tensorflow as tf
model = tf.keras.applications.ResNet50(weights='imagenet')
tf.saved_model.save(model, "resnet50/1/")

Both frameworks offer straightforward ways to save models for serving, but Serve uses PyTorch's native format while TensorFlow Serving uses the SavedModel format. Serve's approach is more aligned with PyTorch's ecosystem, making it easier for PyTorch users to deploy their models. However, TensorFlow Serving's SavedModel format is more widely supported and offers better versioning capabilities.

8,010

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Pros of Triton Inference Server

  • Supports multiple deep learning frameworks (TensorFlow, PyTorch, ONNX, etc.)
  • Offers dynamic batching and concurrent model execution
  • Provides GPU acceleration and multi-GPU support

Cons of Triton Inference Server

  • Steeper learning curve due to more complex configuration
  • May have higher resource overhead for simpler deployment scenarios

Code Comparison

TensorFlow Serving:

import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.inputs['input'].CopyFrom(tf.make_tensor_proto(input_data))

Triton Inference Server:

import tritonclient.grpc as grpcclient

client = grpcclient.InferenceServerClient(url='localhost:8001')
inputs = [grpcclient.InferInput('input', input_shape, 'FP32')]
inputs[0].set_data_from_numpy(input_data)
result = client.infer(model_name='my_model', inputs=inputs)

Both repositories provide powerful inference serving capabilities, but Triton Inference Server offers more flexibility in terms of supported frameworks and deployment options, while TensorFlow Serving is more tightly integrated with the TensorFlow ecosystem.

7,069

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Pros of BentoML

  • Supports multiple ML frameworks (TensorFlow, PyTorch, scikit-learn, etc.)
  • Easier to use and more flexible for various deployment scenarios
  • Built-in model versioning and management

Cons of BentoML

  • Less optimized for TensorFlow-specific deployments
  • Smaller community and ecosystem compared to TensorFlow Serving

Code Comparison

BentoML:

import bentoml

@bentoml.env(pip_packages=["tensorflow"])
@bentoml.artifacts([TensorflowSavedModelArtifact('model')])
class TensorflowModelService(bentoml.BentoService):
    @bentoml.api(input=TensorflowTensorInput())
    def predict(self, input_data):
        return self.artifacts.model(input_data)

TensorFlow Serving:

import tensorflow as tf

model = tf.saved_model.load("path/to/model")
serving_fn = model.signatures["serving_default"]

def predict(input_data):
    return serving_fn(tf.constant(input_data))

BentoML offers a more declarative approach with built-in service definition, while TensorFlow Serving requires separate model loading and serving setup. BentoML's code is more self-contained and easier to package for deployment, whereas TensorFlow Serving typically requires additional configuration files and deployment scripts.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TensorFlow Serving

Ubuntu Build Status Ubuntu Build Status at TF HEAD Docker CPU Nightly Build Status Docker GPU Nightly Build Status


TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

To note a few features:

  • Can serve multiple models, or multiple versions of the same model simultaneously
  • Exposes both gRPC as well as HTTP inference endpoints
  • Allows deployment of new model versions without changing any client code
  • Supports canarying new versions and A/B testing experimental models
  • Adds minimal latency to inference time due to efficient, low-overhead implementation
  • Features a scheduler that groups individual inference requests into batches for joint execution on GPU, with configurable latency controls
  • Supports many servables: Tensorflow models, embeddings, vocabularies, feature transformations and even non-Tensorflow-based machine learning models

Serve a Tensorflow model in 60 seconds

# Download the TensorFlow Serving Docker image and repo
docker pull tensorflow/serving

git clone https://github.com/tensorflow/serving
# Location of demo models
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"

# Start TensorFlow Serving container and open the REST API port
docker run -t --rm -p 8501:8501 \
    -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
    -e MODEL_NAME=half_plus_two \
    tensorflow/serving &

# Query the model using the predict API
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
    -X POST http://localhost:8501/v1/models/half_plus_two:predict

# Returns => { "predictions": [2.5, 3.0, 4.5] }

End-to-End Training & Serving Tutorial

Refer to the official Tensorflow documentations site for a complete tutorial to train and serve a Tensorflow Model.

Documentation

Set up

The easiest and most straight-forward way of using TensorFlow Serving is with Docker images. We highly recommend this route unless you have specific needs that are not addressed by running in a container.

Use

Export your Tensorflow model

In order to serve a Tensorflow model, simply export a SavedModel from your Tensorflow program. SavedModel is a language-neutral, recoverable, hermetic serialization format that enables higher-level systems and tools to produce, consume, and transform TensorFlow models.

Please refer to Tensorflow documentation for detailed instructions on how to export SavedModels.

Configure and Use Tensorflow Serving

Extend

Tensorflow Serving's architecture is highly modular. You can use some parts individually (e.g. batch scheduling) and/or extend it to serve new use cases.

Contribute

If you'd like to contribute to TensorFlow Serving, be sure to review the contribution guidelines.

For more information

Please refer to the official TensorFlow website for more information.