feast

The Open Source Feature Store for AI/ML

6,018

1,077

6,018

251

View on GitHub

Top Related Projects

BentoML

7,657

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

mlflow

20,329

Open source platform for the machine learning lifecycle

kubeflow

14,906

Machine Learning Toolkit for Kubernetes

seldon-core

4,509

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

cortex

8,033

Production infrastructure for machine learning at scale

Quick Overview

Feast (Feature Store) is an open-source feature store for machine learning. It provides a centralized repository for managing and serving machine learning features, enabling teams to share, discover, and access features across projects and models. Feast helps bridge the gap between data engineering and machine learning, streamlining the process of feature management and serving.

Pros

Unified feature management: Centralized storage and access to features across different projects and teams
Consistent feature serving: Ensures consistency between training and serving environments
Integration with popular ML frameworks: Works well with TensorFlow, PyTorch, and other common ML tools
Scalable architecture: Designed to handle large-scale feature management and serving

Cons

Learning curve: Requires understanding of feature store concepts and Feast-specific terminology
Setup complexity: Initial configuration and infrastructure setup can be challenging for beginners
Limited built-in transformations: Some advanced feature engineering may require external tools
Ongoing maintenance: Requires regular updates and management of feature definitions and data sources

Code Examples

Defining a feature view:

from feast import FeatureView, Entity, Field
from feast.types import Float32, Int64

driver = Entity(name="driver_id", join_keys=["driver_id"])

driver_stats_fv = FeatureView(
    name="driver_stats",
    entities=[driver],
    ttl=timedelta(days=1),
    schema=[
        Field(name="avg_daily_trips", dtype=Float32),
        Field(name="total_completed_trips", dtype=Int64),
    ],
    source=my_data_source,
)

Retrieving features for model training:

from feast import FeatureStore

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_stats:avg_daily_trips",
        "driver_stats:total_completed_trips",
    ],
).to_df()

Online feature retrieval:

from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        "driver_stats:avg_daily_trips",
        "driver_stats:total_completed_trips",
    ],
    entity_rows=[{"driver_id": 1001}]
).to_dict()

Getting Started

Install Feast:

pip install feast

Initialize a new Feast project:

feast init my_feature_repo
cd my_feature_repo

Define your features in features.py:

from feast import Entity, Feature, FeatureView, ValueType
from feast.data_source import FileSource

driver = Entity(name="driver", value_type=ValueType.INT64)

driver_stats_source = FileSource(
    path="data/driver_stats.parquet",
    event_timestamp_column="event_timestamp",
)

driver_stats_fv = FeatureView(
    name="driver_stats",
    entities=["driver"],
    ttl=timedelta(days=1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    source=driver_stats_source,
)

Apply the feature definitions:

feast apply

Competitor Comparisons

flink

24,808

Apache Flink

Pros of Flink

Powerful stream processing capabilities with low latency and high throughput
Supports both batch and stream processing in a unified framework
Large ecosystem with extensive libraries and connectors

Cons of Flink

Steeper learning curve due to its complex architecture
Requires more resources and infrastructure setup compared to Feast
Less focused on feature management for machine learning

Code Comparison

Flink (Stream Processing):

DataStream<String> stream = env.addSource(new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), properties));
DataStream<Tuple2<String, Integer>> wordCounts = stream
    .flatMap(new Tokenizer())
    .keyBy(value -> value.f0)
    .sum(1);

Feast (Feature Retrieval):

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips"
    ],
    entity_rows=[{"driver_id": 1001}]
)

While Flink excels in stream processing and complex data pipelines, Feast is specifically designed for feature management in machine learning workflows. Flink offers more flexibility for general-purpose data processing, whereas Feast provides a streamlined approach to feature engineering and serving for ML models.

BentoML

7,657

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Pros of BentoML

Focuses on model serving and deployment, providing a complete MLOps solution
Offers built-in model versioning and containerization for easier deployment
Supports a wide range of ML frameworks and integrates well with existing ML pipelines

Cons of BentoML

Lacks specific feature store capabilities, which Feast specializes in
May have a steeper learning curve for teams primarily focused on feature management
Less emphasis on data ingestion and transformation compared to Feast

Code Comparison

BentoML example:

import bentoml

@bentoml.env(pip_packages=["scikit-learn"])
@bentoml.artifacts([SklearnModelArtifact('model')])
class SklearnIrisClassifier(bentoml.BentoService):
    @bentoml.api(input=JsonInput(), output=JsonOutput())
    def predict(self, input_data):
        return self.artifacts.model.predict(input_data)

Feast example:

from feast import FeatureStore, Entity, Feature, ValueType, FeatureView

driver = Entity(name="driver_id", value_type=ValueType.INT64)
driver_stats_fv = FeatureView(
    name="driver_stats",
    entities=["driver_id"],
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
    ],
    batch_source=batch_source,
)

mlflow

20,329

Open source platform for the machine learning lifecycle

Pros of MLflow

More comprehensive ML lifecycle management, covering experiment tracking, model packaging, and deployment
Broader language support, including Python, R, Java, and more
Larger community and ecosystem, with more integrations and plugins available

Cons of MLflow

Steeper learning curve due to its broader feature set
Can be overkill for simpler ML projects or those focused primarily on feature management
Less specialized in feature store capabilities compared to Feast

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()

Feast:

from feast import FeatureStore

store = FeatureStore(repo_path=".")
features = store.get_online_features(
    features=["driver:rating", "driver:trips_today"],
    entity_rows=[{"driver_id": 1001}]
)

The code snippets demonstrate the core functionalities of each tool. MLflow focuses on experiment tracking and logging, while Feast specializes in feature retrieval and management for machine learning models.

kubeflow

14,906

Machine Learning Toolkit for Kubernetes

Pros of Kubeflow

Comprehensive end-to-end ML platform with support for various stages of the ML lifecycle
Strong integration with Kubernetes for scalable and portable deployments
Extensive ecosystem with multiple components for different ML tasks

Cons of Kubeflow

Steeper learning curve due to its complexity and breadth of features
Requires more resources and infrastructure setup compared to Feast
May be overkill for simpler ML projects or teams focused primarily on feature management

Code Comparison

Feast (Feature definition):

from feast import Entity, Feature, FeatureView, ValueType

driver = Entity(name="driver_id", value_type=ValueType.INT64)

driver_stats_fv = FeatureView(
    name="driver_stats",
    entities=[driver],
    ttl=timedelta(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
    ],
)

Kubeflow (Pipeline definition):

import kfp
from kfp import dsl

@dsl.pipeline(name='My pipeline')
def my_pipeline():
    preprocess_op = dsl.ContainerOp(
        name='Preprocess',
        image='preprocess-image:latest',
        arguments=['--input', 'data.csv', '--output', 'processed.csv']
    )
    train_op = dsl.ContainerOp(
        name='Train',
        image='train-image:latest',
        arguments=['--data', preprocess_op.output]
    )

seldon-core

4,509

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Pros of Seldon Core

Focuses on model deployment and serving, offering advanced features like A/B testing and canary deployments
Supports multiple ML frameworks and languages, providing greater flexibility
Includes built-in monitoring and explainability tools for deployed models

Cons of Seldon Core

Steeper learning curve due to its complexity and Kubernetes-centric architecture
Requires more infrastructure setup and management compared to Feast
Less emphasis on feature management and data preprocessing

Code Comparison

Seldon Core (Python):

class MyModel(object):
    def predict(self, X, features_names):
        return X * 2

from seldon_core.seldon_client import SeldonClient
sc = SeldonClient(deployment_name="mymodel", namespace="default")
client_prediction = sc.predict(data=X)

Feast (Python):

from feast import FeatureStore
store = FeatureStore("feature_repo/")
features = store.get_online_features(
    features=["driver_hourly_stats:conv_rate"],
    entity_rows=[{"driver_id": 1001}]
)

Both projects serve different purposes in the ML ecosystem. Seldon Core excels in model deployment and serving, while Feast specializes in feature management and serving. The choice between them depends on specific project requirements and existing infrastructure.

cortex

8,033

Production infrastructure for machine learning at scale

Pros of Cortex

Provides end-to-end ML deployment and serving capabilities
Supports automatic scaling and infrastructure management
Offers a unified platform for model deployment across various frameworks

Cons of Cortex

Steeper learning curve due to its comprehensive nature
May be overkill for simple feature serving use cases
Less focus on feature management compared to Feast

Code Comparison

Feast example:

from feast import FeatureStore

store = FeatureStore("feature_repo/")
features = store.get_online_features(
    features=["driver:rating", "driver:trips_today"],
    entity_rows=[{"driver_id": 1001}]
)

Cortex example:

import cortex

@cortex.predictor
def predict(payload):
    # Model inference logic here
    return prediction

cortex.deploy("my-api.yaml")

While Feast focuses on feature management and serving, Cortex provides a more comprehensive ML deployment solution. Feast is ideal for organizations primarily concerned with feature engineering and serving, whereas Cortex suits those seeking an all-in-one platform for model deployment and scaling.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Join us on Slack!

ððð Come say hi on Slack!

Check out our DeepWiki!

Overview

Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference.

Feast allows ML platform teams to:

Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).
Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training.
Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another.

Please see our documentation for more information about the project.

ð Architecture

The above architecture is the minimal Feast deployment. Want to run the full Feast on Snowflake/GCP/AWS? Click here.

ð£ Getting Started

1. Install Feast

pip install feast

2. Create a feature repository

feast init my_feature_repo
cd my_feature_repo/feature_repo

3. Register your feature definitions and set up your feature store

feast apply

4. Explore your data in the web UI (experimental)

Web UI

feast ui

5. Build a training dataset

from feast import FeatureStore
import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame.from_dict({
    "driver_id": [1001, 1002, 1003, 1004],
    "event_timestamp": [
        datetime(2021, 4, 12, 10, 59, 42),
        datetime(2021, 4, 12, 8,  12, 10),
        datetime(2021, 4, 12, 16, 40, 26),
        datetime(2021, 4, 12, 15, 1 , 12)
    ]
})

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features = [
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
).to_df()

print(training_df.head())

# Train model
# model = ml.fit(training_df)

            event_timestamp  driver_id  conv_rate  acc_rate  avg_daily_trips
0 2021-04-12 08:12:10+00:00       1002   0.713465  0.597095              531
1 2021-04-12 10:59:42+00:00       1001   0.072752  0.044344               11
2 2021-04-12 15:01:12+00:00       1004   0.658182  0.079150              220
3 2021-04-12 16:40:26+00:00       1003   0.162092  0.309035              959

6. Load feature values into your online store

CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!

7. Read online features at low latency

from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
    entity_rows=[{"driver_id": 1001}]
).to_dict()

pprint(feature_vector)

# Make prediction
# model.predict(feature_vector)

{
    "driver_id": [1001],
    "driver_hourly_stats__conv_rate": [0.49274],
    "driver_hourly_stats__acc_rate": [0.92743],
    "driver_hourly_stats__avg_daily_trips": [72]
}

ð¦ Functionality and Roadmap

The list below contains the functionality that contributors are planning to develop for Feast.

We welcome contribution to all items in the roadmap!
Natural Language Processing
- Vector Search (Alpha release. See RFC)
- Enhanced Feature Server and SDK for native support for NLP
Data Sources
- Snowflake source
- Redshift source
- BigQuery source
- Parquet file source
- Azure Synapse + Azure SQL source (contrib plugin)
- Hive (community plugin)
- Postgres (contrib plugin)
- Spark (contrib plugin)
- Couchbase (contrib plugin)
- Kafka / Kinesis sources (via push support into the online store)
Offline Stores
Online Stores
Feature Engineering
- On-demand Transformations (On Read) (Beta release. See RFC)
- Streaming Transformations (Alpha release. See RFC)
- Batch transformation (In progress. See RFC)
- On-demand Transformations (On Write) (Beta release. See GitHub Issue)
Streaming
Deployments
- AWS Lambda (Alpha release. See RFC)
- Kubernetes (See guide)
Feature Serving
Data Quality Management (See RFC)
- Data profiling and validation (Great Expectations)
Feature Discovery and Governance
- Python SDK for browsing feature registry
- CLI for browsing feature registry
- Model-centric feature tracking (feature services)
- Amundsen integration (see Feast extractor)
- DataHub integration (see DataHub Feast docs)
- Feast Web UI (Beta release. See docs)
- Feast Lineage Explorer

ð Important Resources

Please refer to the official documentation at Documentation

ð Contributing

Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project:

ð GitHub Star History

â¨ Contributors

Thanks goes to these incredible people:

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Flink

Cons of Flink

Code Comparison

Pros of BentoML

Cons of BentoML

Code Comparison

Pros of MLflow

Cons of MLflow

Code Comparison

Pros of Kubeflow

Cons of Kubeflow

Code Comparison

Pros of Seldon Core

Cons of Seldon Core

Code Comparison

Pros of Cortex

Cons of Cortex

Code Comparison

Convert designs to code with AI

README

Join us on Slack!

Overview

ð Architecture

ð£ Getting Started

1. Install Feast

2. Create a feature repository

3. Register your feature definitions and set up your feature store

4. Explore your data in the web UI (experimental)

5. Build a training dataset

6. Load feature values into your online store

7. Read online features at low latency

ð¦ Functionality and Roadmap

ð Important Resources

ð Contributing

ð GitHub Star History

â¨ Contributors

Top Related Projects

Convert designs to code with AI

ð Architecture

ð£ Getting Started

ð¦ Functionality and Roadmap

ð Important Resources

ð Contributing

ð GitHub Star History

â¨ Contributors