ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

38,187

6,651

38,187

2,958

View on GitHub

Top Related Projects

dask

13,376

Parallel computing with task scheduling

spark

41,366

Apache Spark - A unified analytics engine for large-scale data processing

horovod

14,559

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

Quick Overview

Ray is an open-source distributed computing framework that enables scaling Python applications from a laptop to a cluster. It provides a simple, universal API for building distributed applications and includes libraries for machine learning, reinforcement learning, and other compute-intensive tasks.

Pros

Seamless scaling from local development to large clusters
Rich ecosystem of libraries for AI/ML tasks (e.g., Ray Tune, RLlib)
Easy-to-use API for distributed computing
Active community and regular updates

Cons

Steeper learning curve for complex distributed systems
Documentation can be overwhelming for beginners
Some features may be overkill for smaller projects
Occasional stability issues in cutting-edge features

Code Examples

Simple parallel task execution:

import ray

@ray.remote
def f(x):
    return x * x

ray.init()
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))  # [0, 1, 4, 9]

Distributed machine learning with Ray Tune:

from ray import tune

def objective(config):
    score = config["x"] ** 2 + config["y"] ** 2
    return {"score": score}

analysis = tune.run(
    objective,
    config={
        "x": tune.uniform(-10, 10),
        "y": tune.uniform(-10, 10),
    },
    num_samples=100
)

print("Best config:", analysis.get_best_config(metric="score", mode="min"))

Actor-based stateful computation:

import ray

@ray.remote
class Counter:
    def __init__(self):
        self.value = 0

    def increment(self):
        self.value += 1
        return self.value

ray.init()
counter = Counter.remote()
print(ray.get([counter.increment.remote() for _ in range(5)]))  # [1, 2, 3, 4, 5]

Getting Started

To get started with Ray:

Install Ray:

pip install ray

Import Ray and initialize:

import ray
ray.init()

Define remote functions or actors:

@ray.remote
def my_function(x):
    return x * 2

result = ray.get(my_function.remote(4))
print(result)  # 8

For more advanced usage, refer to the official Ray documentation for specific libraries and use cases.

Competitor Comparisons

dask

13,376

Parallel computing with task scheduling

Pros of Dask

Seamless integration with NumPy and Pandas, making it easier for data scientists familiar with these libraries
More mature and established ecosystem for data analysis and scientific computing
Better support for out-of-core computations and handling larger-than-memory datasets

Cons of Dask

Less suitable for general-purpose distributed computing tasks
Limited support for machine learning workloads compared to Ray
Slower performance for certain types of parallel computations

Code Comparison

Dask:

import dask.dataframe as dd

df = dd.read_csv('large_file.csv')
result = df.groupby('column').mean().compute()

Ray:

import ray
import pandas as pd

@ray.remote
def process_chunk(chunk):
    return chunk.groupby('column').mean()

df = pd.read_csv('large_file.csv')
result = ray.get([process_chunk.remote(chunk) for chunk in np.array_split(df, 100)])

This comparison shows how Dask provides a more familiar interface for data processing tasks, while Ray offers more flexibility for custom parallel computations.

spark

41,366

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

Mature ecosystem with extensive libraries and integrations
Robust support for SQL and structured data processing
Strong community and enterprise adoption

Cons of Spark

Steeper learning curve for beginners
Higher resource consumption for small-scale tasks
Less flexible for general-purpose distributed computing

Code Comparison

Spark:

val df = spark.read.json("data.json")
df.groupBy("category").agg(avg("price")).show()

Ray:

@ray.remote
def process_data(data):
    # Custom data processing logic
    return result

results = ray.get([process_data.remote(chunk) for chunk in data_chunks])

Key Differences

Spark focuses on big data processing and analytics, while Ray is more versatile for distributed computing tasks
Ray offers easier scaling for machine learning and AI workloads
Spark has better built-in support for SQL and DataFrames, while Ray provides more flexibility for custom distributed algorithms

Use Cases

Choose Spark for large-scale data processing, ETL, and SQL-based analytics
Opt for Ray when building distributed applications, reinforcement learning systems, or scaling Python code

Performance Considerations

Spark excels at batch processing of large datasets
Ray offers lower latency for fine-grained tasks and better support for GPU acceleration

horovod

14,559

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Pros of Horovod

Specialized for distributed deep learning, particularly with TensorFlow, PyTorch, and MXNet
Efficient communication using ring-allreduce algorithm
Easier to scale existing single-GPU training scripts to multiple GPUs or nodes

Cons of Horovod

Limited to deep learning workloads, less versatile for general distributed computing
Requires more manual configuration for complex distributed setups
Less extensive ecosystem and fewer built-in features compared to Ray

Code Comparison

Horovod:

import horovod.tensorflow as hvd
hvd.init()
optimizer = tf.optimizers.Adam(0.001 * hvd.size())
optimizer = hvd.DistributedOptimizer(optimizer)

Ray:

import ray
from ray import train
@ray.remote
def train_func():
    model.fit(train_data)
results = ray.get([train_func.remote() for _ in range(num_workers)])

Summary

Horovod excels in distributed deep learning scenarios, offering efficient communication and easy scaling of existing scripts. Ray, on the other hand, provides a more versatile distributed computing framework suitable for various tasks beyond deep learning. While Horovod focuses on optimizing specific deep learning workflows, Ray offers a broader ecosystem with more built-in features for general distributed computing needs.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Specialized for deep learning model training and inference optimization
Offers advanced memory optimization techniques like ZeRO (Zero Redundancy Optimizer)
Provides seamless integration with popular deep learning frameworks like PyTorch

Cons of DeepSpeed

More focused on deep learning, less versatile for general distributed computing tasks
Steeper learning curve for users not familiar with deep learning concepts
Smaller community and ecosystem compared to Ray

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
                                                     model=model,
                                                     model_parameters=params)

Ray:

import ray
@ray.remote
def train_model(data):
    # Model training logic here
    return results

results = ray.get([train_model.remote(data) for _ in range(num_workers)])

DeepSpeed is tailored for optimizing deep learning workloads, offering advanced memory management and integration with popular frameworks. Ray, on the other hand, provides a more general-purpose distributed computing framework suitable for various tasks beyond deep learning. While DeepSpeed excels in specialized deep learning scenarios, Ray offers greater flexibility and a broader range of applications in distributed computing.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

More mature and widely adopted in the deep learning community
Extensive ecosystem of pre-trained models and libraries
Dynamic computational graphs for easier debugging and flexibility

Cons of PyTorch

Less suitable for distributed computing and large-scale data processing
Limited support for non-deep learning machine learning tasks
Steeper learning curve for beginners compared to Ray

Code Comparison

PyTorch example:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)

Ray example:

import ray

@ray.remote
def add(x, y):
    return x + y

result = ray.get(add.remote(1, 2))

PyTorch focuses on tensor operations and neural network building blocks, while Ray emphasizes distributed computing and parallel task execution. PyTorch is primarily used for deep learning tasks, whereas Ray is a more general-purpose framework for scaling Python applications and machine learning workloads.

mlflow

21,434

Pros of MLflow

Focused on experiment tracking and model management
Easier to integrate with existing ML workflows
More comprehensive model versioning and deployment features

Cons of MLflow

Less scalable for distributed computing tasks
Limited support for parallel and distributed training

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", value1)
mlflow.log_metric("metric1", value2)
mlflow.end_run()

Ray:

import ray

@ray.remote
def task(x):
    return x * x

futures = [task.remote(i) for i in range(4)]
print(ray.get(futures))

MLflow focuses on tracking experiments and managing models, making it easier to log parameters, metrics, and artifacts. Ray, on the other hand, excels at distributed computing and parallel task execution, providing a more scalable solution for large-scale machine learning workloads.

While MLflow offers better model versioning and deployment features, Ray provides a more robust framework for distributed computing and scaling machine learning pipelines. MLflow is generally easier to integrate into existing workflows, but Ray offers more flexibility for complex, distributed tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/ray_header_logo.png

.. image:: https://readthedocs.org/projects/ray/badge/?version=master :target: http://docs.ray.io/en/master/?badge=master

.. image:: https://img.shields.io/badge/Ray-Join%20Slack-blue :target: https://www.ray.io/join-slack

.. image:: https://img.shields.io/badge/Discuss-Ask%20Questions-blue :target: https://discuss.ray.io/

.. image:: https://img.shields.io/twitter/follow/raydistributed.svg?style=social&logo=twitter :target: https://twitter.com/raydistributed

.. image:: https://img.shields.io/badge/Get_started_for_free-3C8AE9?logo=data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8%2F9hAAAAAXNSR0IArs4c6QAAAERlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAEKADAAQAAAABAAAAEAAAAAA0VXHyAAABKElEQVQ4Ea2TvWoCQRRGnWCVWChIIlikC9hpJdikSbGgaONbpAoY8gKBdAGfwkfwKQypLQ1sEGyMYhN1Pd%2B6A8PqwBZeOHt%2FvsvMnd3ZXBRFPQjBZ9K6OY8ZxF%2B0IYw9PW3qz8aY6lk92bZ%2BVqSI3oC9T7%2FyCVnrF1ngj93us%2B540sf5BrCDfw9b6jJ5lx%2FyjtGKBBXc3cnqx0INN4ImbI%2Bl%2BPnI8zWfFEr4chLLrWHCp9OO9j19Kbc91HX0zzzBO8EbLK2Iv4ZvNO3is3h6jb%2BCwO0iL8AaWqB7ILPTxq3kDypqvBuYuwswqo6wgYJbT8XxBPZ8KS1TepkFdC79TAHHce%2F7LbVioi3wEfTpmeKtPRGEeoldSP%2FOeoEftpP4BRbgXrYZefsAI%2BP9JU7ImyEAAAAASUVORK5CYII%3D :target: https://www.anyscale.com/ray-on-anyscale?utm_source=github&utm_medium=ray_readme&utm_campaign=get_started_badge

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI libraries for simplifying ML compute:

.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/what-is-ray-padded.svg

.. https://docs.google.com/drawings/d/1Pl8aCYOsZCo61cmp57c7Sja6HhIygGCvSZLi_AuBuqo/edit

Learn more about Ray AI Libraries_:

Data_: Scalable Datasets for ML
Train_: Distributed Training
Tune_: Scalable Hyperparameter Tuning
RLlib_: Scalable Reinforcement Learning
Serve_: Scalable and Programmable Serving

Or more about Ray Core_ and its key abstractions:

Tasks_: Stateless functions executed in the cluster.
Actors_: Stateful worker processes created in the cluster.
Objects_: Immutable values accessible across the cluster.

Learn more about Monitoring and Debugging:

Monitor Ray apps and clusters with the Ray Dashboard <https://docs.ray.io/en/latest/ray-core/ray-dashboard.html>__.
Debug Ray apps with the Ray Distributed Debugger <https://docs.ray.io/en/latest/ray-observability/ray-distributed-debugger.html>__.

Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations_.

Install Ray with: pip install ray. For nightly wheels, see the Installation page <https://docs.ray.io/en/latest/ray-overview/installation.html>__.

.. _Serve: https://docs.ray.io/en/latest/serve/index.html .. _Data: https://docs.ray.io/en/latest/data/dataset.html .. _Workflow: https://docs.ray.io/en/latest/workflows/concepts.html .. _Train: https://docs.ray.io/en/latest/train/train.html .. _Tune: https://docs.ray.io/en/latest/tune/index.html .. _RLlib: https://docs.ray.io/en/latest/rllib/index.html .. _ecosystem of community integrations: https://docs.ray.io/en/latest/ray-overview/ray-libraries.html

Why Ray?

Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.

Ray is a unified way to scale Python and AI applications from a laptop to a cluster.

With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.

More Information

Documentation_
Ray Architecture whitepaper_
Exoshuffle: large-scale data shuffle in Ray_
Ownership: a distributed futures system for fine-grained tasks_
RLlib paper_
Tune paper_

Older documents:

Ray paper_
Ray HotOS paper_
Ray Architecture v1 whitepaper_

.. _Ray AI Libraries: https://docs.ray.io/en/latest/ray-air/getting-started.html .. _Ray Core: https://docs.ray.io/en/latest/ray-core/walkthrough.html .. _Tasks: https://docs.ray.io/en/latest/ray-core/tasks.html .. _Actors: https://docs.ray.io/en/latest/ray-core/actors.html .. _Objects: https://docs.ray.io/en/latest/ray-core/objects.html .. _Documentation: http://docs.ray.io/en/latest/index.html .. _Ray Architecture v1 whitepaper: https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/preview .. _Ray Architecture whitepaper: https://docs.google.com/document/d/1tBw9A4j62ruI5omIJbMxly-la5w4q_TjyJgJL_jN2fI/preview .. _Exoshuffle: large-scale data shuffle in Ray: https://arxiv.org/abs/2203.05072 .. _Ownership: a distributed futures system for fine-grained tasks: https://www.usenix.org/system/files/nsdi21-wang.pdf .. _Ray paper: https://arxiv.org/abs/1712.05889 .. _Ray HotOS paper: https://arxiv.org/abs/1703.03924 .. _RLlib paper: https://arxiv.org/abs/1712.09381 .. _Tune paper: https://arxiv.org/abs/1807.05118

Getting Involved

.. list-table:: :widths: 25 50 25 25 :header-rows: 1

- Platform
- Purpose
- Estimated Response Time
- Support Level
- Discourse Forum_
- For discussions about development and questions about usage.
- < 1 day
- Community
- GitHub Issues_
- For reporting bugs and filing feature requests.
- < 2 days
- Ray OSS Team
- Slack_
- For collaborating with other Ray users.
- < 2 days
- Community
- StackOverflow_
- For asking questions about how to use Ray.
- 3-5 days
- Community
- Meetup Group_
- For learning about Ray projects and best practices.
- Monthly
- Ray DevRel
- Twitter_
- For staying up-to-date on new features.
- Daily
- Ray DevRel

.. _Discourse Forum: https://discuss.ray.io/ .. _GitHub Issues: https://github.com/ray-project/ray/issues .. _StackOverflow: https://stackoverflow.com/questions/tagged/ray .. _Meetup Group: https://www.meetup.com/Bay-Area-Ray-Meetup/ .. _Twitter: https://twitter.com/raydistributed .. _Slack: https://www.ray.io/join-slack?utm_source=github&utm_medium=ray_readme&utm_campaign=getting_involved

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot