Convert Figma logo to code with AI

tensorflow logotfx

TFX is an end-to-end platform for deploying production ML pipelines

2,122
714
2,122
258

Top Related Projects

Machine Learning Pipelines for Kubeflow

36,684

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

18,503

Open source platform for the machine learning lifecycle

5,552

The Open Source Feature Store for Machine Learning

7,069

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Quick Overview

TensorFlow Extended (TFX) is an end-to-end platform for deploying production machine learning pipelines. It provides a set of components and tools that help data scientists and engineers build, test, and deploy robust ML systems.

Pros

  • Scalable and Extensible: TFX is built on top of TensorFlow and can scale to handle large-scale data and models.
  • Modular Design: TFX has a modular design, allowing users to easily integrate custom components or replace existing ones.
  • Production-Ready: TFX is designed for production use, with features like monitoring, versioning, and deployment automation.
  • Ecosystem Integration: TFX integrates with various tools and services in the ML ecosystem, such as Kubeflow, Apache Beam, and BigQuery.

Cons

  • Steep Learning Curve: TFX has a relatively steep learning curve, especially for users new to the TensorFlow ecosystem.
  • Limited Documentation: The documentation for TFX, while improving, can still be challenging to navigate for some users.
  • Performance Overhead: The additional abstraction and tooling provided by TFX can introduce some performance overhead compared to a more lightweight approach.
  • Vendor Lock-in: TFX is tightly integrated with the TensorFlow ecosystem, which may limit its adoption for users who prefer other ML frameworks.

Code Examples

# Define a simple TFX pipeline
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen
from tfx.components.trainer.component import Trainer
from tfx.dsl.components.base.executor_spec import ExecutorClassSpec
from tfx.extensions.google_cloud_ai_platform.trainer.executor import Executor as AITrainerExecutor

# Create an InteractiveContext
context = InteractiveContext()

# Define the components
example_gen = CsvExampleGen(input_base='path/to/data')
trainer = Trainer(
    module_file='path/to/trainer_module.py',
    custom_executor_spec=ExecutorClassSpec(AITrainerExecutor),
    examples=example_gen.outputs['examples'])

# Run the pipeline
context.run([example_gen, trainer])

This code example demonstrates how to define a simple TFX pipeline with two components: CsvExampleGen and Trainer. The pipeline reads data from a CSV file, generates examples, and trains a model using the Google Cloud AI Platform Trainer executor.

# Define a custom TFX component
from tfx.components.base import BaseComponent, ExecutionDecision
from tfx.types import standard_artifacts

class MyCustomComponent(BaseComponent):
    SPEC_CLASS = MyCustomComponentSpec
    EXECUTOR_SPEC = ExecutorClassSpec(MyCustomComponentExecutor)

    def __init__(self, input_data, output_data):
        self.input_data = input_data
        self.output_data = output_data

    def _create_execution_decision(self, context, inputs, outputs):
        return ExecutionDecision(
            input_dict={"input_data": [inputs["input_data"]]},
            output_dict={"output_data": [outputs["output_data"]]},
            exec_properties={},
            component_id=self.id,
            executor_spec=self.executor_spec,
            cache_key=None)

# Use the custom component in a pipeline
my_custom_component = MyCustomComponent(
    input_data=example_gen.outputs['examples'],
    output_data=standard_artifacts.Examples())

context.run([my_custom_component])

This code example demonstrates how to define a custom TFX component and integrate it into a TFX pipeline. The MyCustomComponent class inherits from BaseComponent and defines the necessary methods to create an execution decision and run the component.

Getting Started

To get started with TFX, you can follow these steps:

  1. Install the TFX library:
pip install tfx
  1. Create a new TFX pipeline:
from tfx.orchestration.experimental.interactive.interactive_

Competitor Comparisons

Machine Learning Pipelines for Kubeflow

Pros of Pipelines

  • More flexible and language-agnostic, supporting multiple ML frameworks
  • Better integration with Kubernetes ecosystem and cloud-native technologies
  • Stronger focus on end-to-end ML workflows, including deployment and monitoring

Cons of Pipelines

  • Steeper learning curve due to Kubernetes complexity
  • Less seamless integration with TensorFlow-specific features
  • Potentially more resource-intensive for smaller projects

Code Comparison

TFX example:

import tfx
from tfx.components import CsvExampleGen

example_gen = CsvExampleGen(input_base='/path/to/data')

Pipelines example:

from kfp import dsl

@dsl.pipeline(name='My pipeline')
def my_pipeline():
    data_op = dsl.ContainerOp(
        name='Load Data',
        image='data-loader:latest',
        arguments=['--data-path', '/path/to/data']
    )

TFX is more tightly integrated with TensorFlow, offering a simpler setup for TensorFlow-based projects. Pipelines provides a more flexible, container-based approach that can accommodate various ML frameworks and tools, but requires more configuration and Kubernetes knowledge.

36,684

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Pros of Airflow

  • More general-purpose workflow orchestration, suitable for a wide range of data processing tasks
  • Larger community and ecosystem with extensive plugins and integrations
  • Easier to set up and use for non-ML-specific workflows

Cons of Airflow

  • Less specialized for machine learning pipelines compared to TFX
  • May require more custom code for ML-specific tasks
  • Lacks built-in ML model versioning and metadata tracking

Code Comparison

Airflow DAG example:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator

def process_data():
    # Data processing logic

dag = DAG('data_pipeline', default_args=default_args, schedule_interval=timedelta(days=1))
process_task = PythonOperator(task_id='process_data', python_callable=process_data, dag=dag)

TFX pipeline example:

from tfx import components
from tfx.orchestration import pipeline

def create_pipeline():
    example_gen = components.CsvExampleGen(input_base=data_root)
    statistics_gen = components.StatisticsGen(examples=example_gen.outputs['examples'])
    schema_gen = components.SchemaGen(statistics=statistics_gen.outputs['statistics'])
    return pipeline.Pipeline(components=[example_gen, statistics_gen, schema_gen])
18,503

Open source platform for the machine learning lifecycle

Pros of MLflow

  • More lightweight and flexible, easier to integrate with various ML frameworks
  • Better support for experiment tracking and model versioning
  • Simpler setup and usage, with a lower learning curve

Cons of MLflow

  • Less comprehensive end-to-end ML pipeline management
  • Fewer built-in components for data validation and preprocessing
  • Limited support for large-scale distributed training

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", value1)
mlflow.log_metric("metric1", value2)
mlflow.end_run()

TFX:

from tfx import components
from tfx.orchestration import pipeline

example_gen = components.CsvExampleGen(input_base=data_root)
statistics_gen = components.StatisticsGen(examples=example_gen.outputs['examples'])
schema_gen = components.SchemaGen(statistics=statistics_gen.outputs['statistics'])

MLflow focuses on experiment tracking and model management, while TFX provides a more comprehensive pipeline for production ML workflows. MLflow is easier to adopt and use with various ML frameworks, but TFX offers more robust data validation and preprocessing capabilities, especially within the TensorFlow ecosystem. The code examples highlight MLflow's simplicity in logging experiments versus TFX's pipeline-based approach for data processing and model training.

5,552

The Open Source Feature Store for Machine Learning

Pros of Feast

  • Lightweight and focused specifically on feature management
  • Easier to integrate with existing data infrastructure
  • Supports multiple data sources and storage backends out-of-the-box

Cons of Feast

  • Less comprehensive ML pipeline support compared to TFX
  • Smaller community and ecosystem
  • Limited built-in model training and serving capabilities

Code Comparison

Feast example:

from feast import FeatureStore

store = FeatureStore("feature_repo/")
features = store.get_online_features(
    features=["driver:rating", "driver:trips_today"],
    entity_rows=[{"driver_id": 1001}]
)

TFX example:

from tfx import components
from tfx.orchestration import pipeline

example_gen = components.CsvExampleGen(input_base="data/")
statistics_gen = components.StatisticsGen(examples=example_gen.outputs['examples'])
schema_gen = components.SchemaGen(statistics=statistics_gen.outputs['statistics'])

Feast focuses on feature retrieval and management, while TFX provides a more comprehensive pipeline for data processing, model training, and deployment. Feast is more suitable for teams looking to add feature management to existing ML workflows, while TFX offers an end-to-end solution for building production-ready ML pipelines.

7,069

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Pros of BentoML

  • Lightweight and flexible, easier to get started with for smaller projects
  • Supports a wider range of ML frameworks beyond TensorFlow
  • Focuses on model serving and deployment, with built-in performance optimizations

Cons of BentoML

  • Less comprehensive end-to-end ML pipeline support compared to TFX
  • Smaller community and ecosystem than TensorFlow/TFX
  • May require more manual configuration for complex production scenarios

Code Comparison

BentoML example:

import bentoml

@bentoml.env(pip_packages=["scikit-learn"])
@bentoml.artifacts([SklearnModelArtifact('model')])
class SklearnIrisClassifier(bentoml.BentoService):
    @bentoml.api(input=JsonInput(), output=JsonOutput())
    def predict(self, input_data):
        return self.artifacts.model.predict(input_data)

TFX example:

import tfx
from tfx.components import Trainer

trainer = Trainer(
    module_file=module_file,
    examples=transform.outputs['transformed_examples'],
    transform_graph=transform.outputs['transform_graph'],
    schema=schema_gen.outputs['schema'],
    train_args=trainer_pb2.TrainArgs(num_steps=10000),
    eval_args=trainer_pb2.EvalArgs(num_steps=5000)
)

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TFX

Python PyPI TensorFlow

TensorFlow Extended (TFX) is a Google-production-scale machine learning platform based on TensorFlow. It provides a configuration framework to express ML pipelines consisting of TFX components. TFX pipelines can be orchestrated using Apache Airflow and Kubeflow Pipelines. Both the components themselves as well as the integrations with orchestration systems can be extended.

TFX components interact with a ML Metadata backend that keeps a record of component runs, input and output artifacts, and runtime configuration. This metadata backend enables advanced functionality like experiment tracking or warmstarting/resuming ML models from previous runs.

TFX Components

Documentation

User Documentation

Please see the TFX User Guide.

Development References

Roadmap

The TFX Roadmap, which is updated quarterly.

Release Details

For detailed previous and upcoming changes, please check here

Requests For Comment

TFX is an open-source project and we strongly encourage active participation by the ML community in helping to shape TFX to meet or exceed their needs. An important component of that effort is the RFC process. Please see the listing of current and past TFX RFCs. Please see the TensorFlow Request for Comments (TF-RFC) process page for information on how community members can contribute.

Examples

Compatible versions

The following table describes how the tfx package versions are compatible with its major dependency PyPI packages. This is determined by our testing framework, but other untested combinations may also work.

tfxPythonapache-beam[gcp]ml-metadatapyarrowtensorflowtensorflow-data-validationtensorflow-metadatatensorflow-model-analysistensorflow-serving-apitensorflow-transformtfx-bsl
GitHub master>=3.9,<3.112.59.01.16.010.0.1nightly (2.x)1.16.11.16.10.47.02.16.11.16.01.16.1
1.16.0>=3.9,<3.112.59.01.16.010.0.12.161.16.11.16.10.47.02.16.11.16.01.16.1
1.15.0>=3.9,<3.112.47.01.15.010.0.02.151.15.11.15.00.46.02.15.11.15.01.15.1
1.14.0>=3.8,<3.112.47.01.14.010.0.02.131.14.01.14.00.45.02.9.01.14.01.14.0
1.13.0>=3.8,<3.102.40.01.13.16.0.02.121.13.01.13.10.44.02.9.01.13.01.13.0
1.12.0>=3.7,<3.102.40.01.12.06.0.02.111.12.01.12.00.43.02.9.01.12.01.12.0
1.11.0>=3.7,<3.102.40.01.11.06.0.01.15.5 / 2.10.01.11.01.11.00.42.02.9.01.11.01.11.0
1.10.0>=3.7,<3.102.40.01.10.06.0.01.15.5 / 2.9.01.10.01.10.00.41.02.9.01.10.01.10.0
1.9.0>=3.7,<3.102.38.01.9.05.0.01.15.5 / 2.9.01.9.01.9.00.40.02.9.01.9.01.9.0
1.8.0>=3.7,<3.102.38.01.8.05.0.01.15.5 / 2.8.01.8.01.8.00.39.02.8.01.8.01.8.0
1.7.0>=3.7,<3.92.36.01.7.05.0.01.15.5 / 2.8.01.7.01.7.00.38.02.8.01.7.01.7.0
1.6.2>=3.7,<3.92.35.01.6.05.0.01.15.5 / 2.8.01.6.01.6.00.37.02.7.01.6.01.6.0
1.6.0>=3.7,<3.92.35.01.6.05.0.01.15.5 / 2.7.01.6.01.6.00.37.02.7.01.6.01.6.0
1.5.0>=3.7,<3.92.34.01.5.05.0.01.15.2 / 2.7.01.5.01.5.00.36.02.7.01.5.01.5.0
1.4.0>=3.7,<3.92.33.01.4.05.0.01.15.0 / 2.6.01.4.01.4.00.35.02.6.01.4.01.4.0
1.3.4>=3.6,<3.92.32.01.3.02.0.01.15.0 / 2.6.01.3.01.2.00.34.12.6.01.3.01.3.0
1.3.3>=3.6,<3.92.32.01.3.02.0.01.15.0 / 2.6.01.3.01.2.00.34.12.6.01.3.01.3.0
1.3.2>=3.6,<3.92.32.01.3.02.0.01.15.0 / 2.6.01.3.01.2.00.34.12.6.01.3.01.3.0
1.3.1>=3.6,<3.92.32.01.3.02.0.01.15.0 / 2.6.01.3.01.2.00.34.12.6.01.3.01.3.0
1.3.0>=3.6,<3.92.32.01.3.02.0.01.15.0 / 2.6.01.3.01.2.00.34.12.6.01.3.01.3.0
1.2.1>=3.6,<3.92.31.01.2.02.0.01.15.0 / 2.5.01.2.01.2.00.33.02.5.11.2.01.2.0
1.2.0>=3.6,<3.92.31.01.2.02.0.01.15.0 / 2.5.01.2.01.2.00.33.02.5.11.2.01.2.0
1.0.0>=3.6,<3.92.29.01.0.02.0.01.15.0 / 2.5.01.0.01.0.00.31.02.5.11.0.01.0.0
0.30.0>=3.6,<3.92.28.00.30.02.0.01.15.0 / 2.4.00.30.00.30.00.30.02.4.00.30.00.30.0
0.29.0>=3.6,<3.92.28.00.29.02.0.01.15.0 / 2.4.00.29.00.29.00.29.02.4.00.29.00.29.0
0.28.0>=3.6,<3.92.28.00.28.02.0.01.15.0 / 2.4.00.28.00.28.00.28.02.4.00.28.00.28.1
0.27.0>=3.6,<3.92.27.00.27.02.0.01.15.0 / 2.4.00.27.00.27.00.27.02.4.00.27.00.27.0
0.26.4>=3.6,<3.92.28.00.26.00.17.01.15.0 / 2.3.00.26.10.26.00.26.02.3.00.26.00.26.0
0.26.3>=3.6,<3.92.25.00.26.00.17.01.15.0 / 2.3.00.26.00.26.00.26.02.3.00.26.00.26.0
0.26.1>=3.6,<3.92.25.00.26.00.17.01.15.0 / 2.3.00.26.00.26.00.26.02.3.00.26.00.26.0
0.26.0>=3.6,<3.92.25.00.26.00.17.01.15.0 / 2.3.00.26.00.26.00.26.02.3.00.26.00.26.0
0.25.0>=3.6,<3.92.25.00.24.00.17.01.15.0 / 2.3.00.25.00.25.00.25.02.3.00.25.00.25.0
0.24.1>=3.6,<3.92.24.00.24.00.17.01.15.0 / 2.3.00.24.10.24.00.24.32.3.00.24.10.24.1
0.24.0>=3.6,<3.92.24.00.24.00.17.01.15.0 / 2.3.00.24.10.24.00.24.32.3.00.24.10.24.1
0.23.1>=3.5,<42.24.00.23.00.17.01.15.0 / 2.3.00.23.10.23.00.23.02.3.00.23.00.23.0
0.23.0>=3.5,<42.23.00.23.00.17.01.15.0 / 2.3.00.23.00.23.00.23.02.3.00.23.00.23.0
0.22.2>=3.5,<42.21.00.22.10.16.01.15.0 / 2.2.00.22.20.22.20.22.22.2.00.22.00.22.1
0.22.1>=3.5,<42.21.00.22.10.16.01.15.0 / 2.2.00.22.20.22.20.22.22.2.00.22.00.22.1
0.22.0>=3.5,<42.21.00.22.00.16.01.15.0 / 2.2.00.22.00.22.00.22.12.2.00.22.00.22.0
0.21.5>=2.7,<3 or >=3.5,<42.17.00.21.20.15.01.15.0 / 2.1.00.21.50.21.10.21.52.1.00.21.20.21.4
0.21.4>=2.7,<3 or >=3.5,<42.17.00.21.20.15.01.15.0 / 2.1.00.21.50.21.10.21.52.1.00.21.20.21.4
0.21.3>=2.7,<3 or >=3.5,<42.17.00.21.20.15.01.15.0 / 2.1.00.21.50.21.10.21.52.1.00.21.20.21.4
0.21.2>=2.7,<3 or >=3.5,<42.17.00.21.20.15.01.15.0 / 2.1.00.21.50.21.10.21.52.1.00.21.20.21.4
0.21.1>=2.7,<3 or >=3.5,<42.17.00.21.20.15.01.15.0 / 2.1.00.21.40.21.10.21.42.1.00.21.20.21.3
0.21.0>=2.7,<3 or >=3.5,<42.17.00.21.00.15.01.15.0 / 2.1.00.21.00.21.00.21.12.1.00.21.00.21.0
0.15.0>=2.7,<3 or >=3.5,<42.16.00.15.00.15.01.15.00.15.00.15.00.15.21.15.00.15.00.15.1
0.14.0>=2.7,<3 or >=3.5,<42.14.00.14.00.14.01.14.00.14.10.14.00.14.01.14.00.14.0n/a
0.13.0>=2.7,<3 or >=3.5,<42.12.00.13.2n/a1.13.10.13.10.13.00.13.21.13.00.13.0n/a
0.12.0>=2.7,<32.10.00.13.2n/a1.12.00.12.00.12.10.12.11.12.00.12.0n/a

Resources