seldon-core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Top Related Projects
Standardized Serverless ML Inference Platform on Kubernetes
Open source platform for the machine learning lifecycle
Production infrastructure for machine learning at scale
The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
Open Source ML Model Versioning, Metadata, and Experiment Management
The Open Source Feature Store for Machine Learning
Quick Overview
Seldon Core is an open-source platform for deploying machine learning models on Kubernetes. It provides a flexible, scalable, and production-ready solution for serving ML models, offering features like A/B testing, canary deployments, and advanced monitoring capabilities.
Pros
- Seamless integration with Kubernetes for scalable ML model deployment
- Support for multiple ML frameworks (TensorFlow, PyTorch, scikit-learn, etc.)
- Advanced features like A/B testing, canary deployments, and explainers
- Extensive monitoring and observability tools
Cons
- Steep learning curve for those unfamiliar with Kubernetes
- Complex setup process for advanced features
- Limited support for edge deployment scenarios
- Resource-intensive for small-scale projects
Code Examples
- Creating a simple Seldon deployment:
from seldon_core.seldon_client import SeldonClient
import numpy as np
sc = SeldonClient(deployment_name="mymodel", namespace="seldon")
client_prediction = sc.predict(data=np.array([[1, 2, 3]]))
print(client_prediction)
- Implementing a custom model:
class MyModel:
def __init__(self):
print("Initializing")
def predict(self, X, features_names=None):
print("Predict called")
return X
def metrics(self):
return [
{"type": "COUNTER", "key": "mycounter", "value": 1},
{"type": "GAUGE", "key": "mygauge", "value": 100},
]
- Creating a Seldon deployment using SeldonDeployment CRD:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
spec:
name: test-deployment
predictors:
- componentSpecs:
- spec:
containers:
- name: classifier
image: seldonio/sklearn-iris:0.1
graph:
name: classifier
type: MODEL
name: example
replicas: 1
Getting Started
- Install Seldon Core on your Kubernetes cluster:
kubectl create namespace seldon-system
helm install seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--set usageMetrics.enabled=true \
--namespace seldon-system
- Create a Seldon deployment (using the YAML example above):
kubectl apply -f seldon-deployment.yaml
- Access your model:
kubectl port-forward svc/seldon-model-example 8000:8000
curl -X POST http://localhost:8000/api/v1.0/predictions \
-H 'Content-Type: application/json' \
-d '{ "data": { "ndarray": [[1,2,3,4]] } }'
Competitor Comparisons
Standardized Serverless ML Inference Platform on Kubernetes
Pros of KServe
- Deeper integration with Kubernetes ecosystem and Knative
- More extensive support for model serving frameworks (TensorFlow, PyTorch, scikit-learn, etc.)
- Built-in support for model explainability and drift detection
Cons of KServe
- Steeper learning curve due to more complex architecture
- Less flexibility in custom model deployment compared to Seldon Core
- Requires Istio for full functionality, which can add complexity
Code Comparison
KServe example:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
sklearn:
storageUri: "gs://kfserving-samples/models/sklearn/iris"
Seldon Core example:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: sklearn-iris
spec:
predictors:
- graph:
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/sklearn/iris
name: default
Both KServe and Seldon Core are powerful platforms for deploying machine learning models in Kubernetes environments. KServe offers tighter integration with the Kubernetes ecosystem and broader support for various frameworks, while Seldon Core provides more flexibility for custom deployments and a gentler learning curve.
Open source platform for the machine learning lifecycle
Pros of MLflow
- More comprehensive ML lifecycle management, including experiment tracking and model registry
- Language-agnostic with support for Python, R, Java, and more
- Easier to set up and use for individual data scientists or small teams
Cons of MLflow
- Less focused on production deployment and scaling of ML models
- Limited built-in support for advanced serving features like A/B testing and canary deployments
- Requires additional tools for robust production-grade model serving
Code Comparison
MLflow example:
import mlflow
mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()
Seldon Core example:
from seldon_core.seldon_client import SeldonClient
sc = SeldonClient(deployment_name="mymodel", namespace="default")
response = sc.predict(data=X)
print(response)
MLflow focuses on tracking experiments and logging metrics, while Seldon Core is designed for deploying and serving models in production environments. MLflow provides a more comprehensive solution for the entire ML lifecycle, whereas Seldon Core excels in robust, scalable model deployment on Kubernetes.
Production infrastructure for machine learning at scale
Pros of Cortex
- Simpler deployment process with automatic infrastructure provisioning
- Native support for AWS, reducing complexity for AWS users
- Built-in autoscaling and GPU support out of the box
Cons of Cortex
- Limited to AWS, while Seldon Core supports multiple cloud providers
- Smaller community and ecosystem compared to Seldon Core
- Less flexibility in terms of customization and integration options
Code Comparison
Cortex deployment example:
- name: iris-classifier
predictor:
type: python
path: predictor.py
compute:
gpu: 1
mem: 4G
Seldon Core deployment example:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
spec:
predictors:
- graph:
name: classifier
implementation: SKLEARN_SERVER
name: default
Both frameworks aim to simplify ML model deployment, but Cortex focuses on AWS-specific deployments with a more streamlined approach, while Seldon Core offers greater flexibility and multi-cloud support at the cost of increased complexity.
The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
Pros of BentoML
- Simpler setup and deployment process, especially for local development
- Built-in support for a wider range of ML frameworks and libraries
- More flexible API serving options, including REST, gRPC, and CLI
Cons of BentoML
- Less mature ecosystem for large-scale production deployments
- Fewer advanced features for monitoring and scaling in complex environments
- Limited native support for A/B testing and canary deployments
Code Comparison
BentoML:
import bentoml
@bentoml.env(pip_packages=["scikit-learn"])
@bentoml.artifacts([bentoml.sklearn.SklearnModelArtifact('model')])
class SklearnIrisClassifier(bentoml.BentoService):
@bentoml.api(input=bentoml.handlers.DataframeHandler())
def predict(self, df):
return self.artifacts.model.predict(df)
Seldon Core:
class IrisClassifier(object):
def __init__(self):
self.model = joblib.load('iris_model.joblib')
def predict(self, X, features_names=None):
return self.model.predict(X)
Both frameworks aim to simplify ML model deployment, but BentoML offers a more user-friendly approach for local development and supports a broader range of ML frameworks out-of-the-box. Seldon Core, on the other hand, provides more advanced features for production-grade deployments and integrates better with Kubernetes ecosystems.
Open Source ML Model Versioning, Metadata, and Experiment Management
Pros of ModelDB
- Focuses on model versioning and metadata tracking
- Provides a user-friendly web interface for experiment management
- Supports integration with popular ML frameworks like TensorFlow and PyTorch
Cons of ModelDB
- Less emphasis on model deployment and serving compared to Seldon Core
- May require additional tools for end-to-end MLOps workflows
- Limited support for advanced deployment scenarios like A/B testing
Code Comparison
ModelDB (Python client):
from verta import Client
client = Client("http://localhost:3000")
proj = client.set_project("My Project")
expt = client.set_experiment("My Experiment")
run = client.set_experiment_run("My Run")
run.log_parameter("num_layers", 5)
run.log_metric("accuracy", 0.95)
Seldon Core (Deployment YAML):
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
spec:
predictors:
- graph:
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/iris
name: default
The Open Source Feature Store for Machine Learning
Pros of Feast
- Specialized in feature management and serving for machine learning
- Supports multiple data sources and feature stores
- Provides a unified API for offline and online feature access
Cons of Feast
- Limited to feature management, not a complete MLOps solution
- Requires additional tools for model deployment and serving
- May have a steeper learning curve for teams new to feature stores
Code Comparison
Feast example:
from feast import FeatureStore
store = FeatureStore("feature_repo/")
features = store.get_online_features(
features=["driver:rating", "driver:trips_today"],
entity_rows=[{"driver_id": 1001}]
)
Seldon Core example:
from seldon_core.seldon_client import SeldonClient
sc = SeldonClient(deployment_name="mymodel", namespace="default")
response = sc.predict(data={"ndarray": [[1.0, 2.0, 5.0]]})
While Feast focuses on feature management and serving, Seldon Core provides a more comprehensive MLOps solution for model deployment and serving. Feast excels in feature engineering and storage, whereas Seldon Core offers broader capabilities for model deployment, A/B testing, and monitoring in production environments.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Seldon Core: Blazing Fast, Industry-Ready ML
A platform to deploy your machine learning models on Kubernetes at massive scale.
Seldon Core V2 Now Available
Seldon Core V2 is now available. If you're new to Seldon Core we recommend you start here. Check out the docs here and make sure to leave feedback on our slack community and submit bugs or feature requests on the repo. The codebase can be found in this branch.
Continue reading for info on Seldon Core V1...
Overview
Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices.
Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, Canaries and more.
- Read the Seldon Core Documentation
- Join our community Slack to ask any questions
- Get started with Seldon Core Notebook Examples
- Join our fortnightly online working group calls : Google Calendar
- Learn how you can start contributing
- Check out Blogs that dive into Seldon Core components
- Watch some of the Videos and Talks using Seldon Core
High Level Features
With over 2M installs, Seldon Core is used across organisations to manage large scale deployment of machine learning models, and key benefits include:
- Easy way to containerise ML models using our pre-packaged inference servers, custom servers, or language wrappers.
- Out of the box endpoints which can be tested through Swagger UI, Seldon Python Client or Curl / GRPCurl.
- Cloud agnostic and tested on AWS EKS, Azure AKS, Google GKE, Alicloud, Digital Ocean and Openshift.
- Powerful and rich inference graphs made out of predictors, transformers, routers, combiners, and more.
- Metadata provenance to ensure each model can be traced back to its respective training system, data and metrics.
- Advanced and customisable metrics with integration to Prometheus and Grafana.
- Full auditability through model input-output request logging integration with Elasticsearch.
- Microservice distributed tracing through integration to Jaeger for insights on latency across microservice hops.
- Secure, reliable and robust system maintained through a consistent security & updates policy.
Getting Started
Deploying your models using Seldon Core is simplified through our pre-packaged inference servers and language wrappers. Below you can see how you can deploy our "hello world Iris" example. You can see more details on these workflows in our Documentation Quickstart.
Install Seldon Core
Quick install using Helm 3 (you can also use Kustomize):
kubectl create namespace seldon-system
helm install seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--set usageMetrics.enabled=true \
--namespace seldon-system \
--set istio.enabled=true
# You can set ambassador instead with --set ambassador.enabled=true
Deploy your model using pre-packaged model servers
We provide optimized model servers for some of the most popular Deep Learning and Machine Learning frameworks that allow you to deploy your trained model binaries/weights without having to containerize or modify them.
You only have to upload your model binaries into your preferred object store, in this case we have a trained scikit-learn iris model in a Google bucket:
gs://seldon-models/v1.19.0-dev/sklearn/iris/model.joblib
Create a namespace to run your model in:
kubectl create namespace seldon
We then can deploy this model with Seldon Core to our Kubernetes cluster using the pre-packaged model server for scikit-learn (SKLEARN_SERVER) by running the kubectl apply
command below:
$ kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
namespace: seldon
spec:
name: iris
predictors:
- graph:
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/v1.19.0-dev/sklearn/iris
name: classifier
name: default
replicas: 1
END
Send API requests to your deployed model
Every model deployed exposes a standardised User Interface to send requests using our OpenAPI schema.
This can be accessed through the endpoint http://<ingress_url>/seldon/<namespace>/<model-name>/api/v1.0/doc/
which will allow you to send requests directly through your browser.
Or alternatively you can send requests programmatically using our Seldon Python Client or another Linux CLI:
$ curl -X POST http://<ingress>/seldon/seldon/iris-model/api/v1.0/predictions \
-H 'Content-Type: application/json' \
-d '{ "data": { "ndarray": [[1,2,3,4]] } }'
{
"meta" : {},
"data" : {
"names" : [
"t:0",
"t:1",
"t:2"
],
"ndarray" : [
[
0.000698519453116284,
0.00366803903943576,
0.995633441507448
]
]
}
}
Deploy your custom model using language wrappers
For more custom deep learning and machine learning use-cases which have custom dependencies (such as 3rd party libraries, operating system binaries or even external systems), we can use any of the Seldon Core language wrappers.
You only have to write a class wrapper that exposes the logic of your model; for example in Python we can create a file Model.py
:
import pickle
class Model:
def __init__(self):
self._model = pickle.loads( open("model.pickle", "rb") )
def predict(self, X):
output = self._model(X)
return output
We can now containerize our class file using the Seldon Core s2i utils to produce the sklearn_iris
image:
s2i build . seldonio/seldon-core-s2i-python3:0.18 sklearn_iris:0.1
And we now deploy it to our Seldon Core Kubernetes Cluster:
$ kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
namespace: model-namespace
spec:
name: iris
predictors:
- componentSpecs:
- spec:
containers:
- name: classifier
image: sklearn_iris:0.1
graph:
name: classifier
name: default
replicas: 1
END
Send API requests to your deployed model
Every model deployed exposes a standardised User Interface to send requests using our OpenAPI schema.
This can be accessed through the endpoint http://<ingress_url>/seldon/<namespace>/<model-name>/api/v1.0/doc/
which will allow you to send requests directly through your browser.
Or alternatively you can send requests programmatically using our Seldon Python Client or another Linux CLI:
$ curl -X POST http://<ingress>/seldon/model-namespace/iris-model/api/v1.0/predictions \
-H 'Content-Type: application/json' \
-d '{ "data": { "ndarray": [1,2,3,4] } }' | json_pp
{
"meta" : {},
"data" : {
"names" : [
"t:0",
"t:1",
"t:2"
],
"ndarray" : [
[
0.000698519453116284,
0.00366803903943576,
0.995633441507448
]
]
}
}
Dive into the Advanced Production ML Integrations
Any model that is deployed and orchestrated with Seldon Core provides out of the box machine learning insights for monitoring, managing, scaling and debugging.
Below are some of the core components together with link to the logs that provide further insights on how to set them up.
Where to go from here
Getting Started
Seldon Core Deep Dive
- Detailed Installation Parameters
- Pre-packaged Inference Servers
- Language Wrappers for Custom Models
- Create your Inference Graph
- Deploy your Model
- Testing your Model Endpoints
- Troubleshooting guide
- Usage reporting
- Upgrading
- Changelog
Pre-Packaged Inference Servers
Language Wrappers (Production)
Language Wrappers (Incubating)
- Java Language Wrapper [Incubating]
- R Language Wrapper [ALPHA]
- NodeJS Language Wrapper [ALPHA]
- Go Language Wrapper [ALPHA]
Ingress
Production
- Supported API Protocols
- CI/CD MLOps at Scale
- Metrics with Prometheus
- Payload Logging with ELK
- Distributed Tracing with Jaeger
- Replica Scaling
- Budgeting Disruptions
- Custom Inference Servers
Advanced Inference
Examples
Reference
- Annotation-based Configuration
- Benchmarking
- General Availability
- Helm Charts
- Images
- Logging & Log Level
- Private Docker Registry
- Prediction APIs
- Python API reference
- Release Highlights
- Seldon Deployment CRD
- Service Orchestrator
- Kubeflow
Developer
About the name "Seldon Core"
The name Seldon (ËSÉldÉn) Core was inspired from the Foundation Series (Sci-fi novels) where its premise consists of a mathematician called "Hari Seldon" who spends his life developing a theory of Psychohistory, a new and effective mathematical sociology which allows for the future to be predicted extremely accurate through long periods of time (across hundreds of thousands of years).
Commercial Offerings
To learn more about our commercial offerings visit https://www.seldon.io/.
License
Top Related Projects
Standardized Serverless ML Inference Platform on Kubernetes
Open source platform for the machine learning lifecycle
Production infrastructure for machine learning at scale
The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
Open Source ML Model Versioning, Metadata, and Experiment Management
The Open Source Feature Store for Machine Learning
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot