Convert Figma logo to code with AI

ray-project logoray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,860
5,928
34,860
4,258

Top Related Projects

12,495

Parallel computing with task scheduling

40,184

Apache Spark - A unified analytics engine for large-scale data processing

14,221

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

18,503

Open source platform for the machine learning lifecycle

Quick Overview

Ray is an open-source distributed computing framework that enables scaling Python applications from a laptop to a cluster. It provides a simple, universal API for building distributed applications and includes libraries for machine learning, reinforcement learning, and other compute-intensive tasks.

Pros

  • Seamless scaling from local development to large clusters
  • Rich ecosystem of libraries for AI/ML tasks (e.g., Ray Tune, RLlib)
  • Easy-to-use API for distributed computing
  • Active community and regular updates

Cons

  • Steeper learning curve for complex distributed systems
  • Documentation can be overwhelming for beginners
  • Some features may be overkill for smaller projects
  • Occasional stability issues in cutting-edge features

Code Examples

  1. Simple parallel task execution:
import ray

@ray.remote
def f(x):
    return x * x

ray.init()
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))  # [0, 1, 4, 9]
  1. Distributed machine learning with Ray Tune:
from ray import tune

def objective(config):
    score = config["x"] ** 2 + config["y"] ** 2
    return {"score": score}

analysis = tune.run(
    objective,
    config={
        "x": tune.uniform(-10, 10),
        "y": tune.uniform(-10, 10),
    },
    num_samples=100
)

print("Best config:", analysis.get_best_config(metric="score", mode="min"))
  1. Actor-based stateful computation:
import ray

@ray.remote
class Counter:
    def __init__(self):
        self.value = 0

    def increment(self):
        self.value += 1
        return self.value

ray.init()
counter = Counter.remote()
print(ray.get([counter.increment.remote() for _ in range(5)]))  # [1, 2, 3, 4, 5]

Getting Started

To get started with Ray:

  1. Install Ray:
pip install ray
  1. Import Ray and initialize:
import ray
ray.init()
  1. Define remote functions or actors:
@ray.remote
def my_function(x):
    return x * 2

result = ray.get(my_function.remote(4))
print(result)  # 8

For more advanced usage, refer to the official Ray documentation for specific libraries and use cases.

Competitor Comparisons

12,495

Parallel computing with task scheduling

Pros of Dask

  • Seamless integration with NumPy and Pandas, making it easier for data scientists familiar with these libraries
  • More mature and established ecosystem for data analysis and scientific computing
  • Better support for out-of-core computations and handling larger-than-memory datasets

Cons of Dask

  • Less suitable for general-purpose distributed computing tasks
  • Limited support for machine learning workloads compared to Ray
  • Slower performance for certain types of parallel computations

Code Comparison

Dask:

import dask.dataframe as dd

df = dd.read_csv('large_file.csv')
result = df.groupby('column').mean().compute()

Ray:

import ray
import pandas as pd

@ray.remote
def process_chunk(chunk):
    return chunk.groupby('column').mean()

df = pd.read_csv('large_file.csv')
result = ray.get([process_chunk.remote(chunk) for chunk in np.array_split(df, 100)])

This comparison shows how Dask provides a more familiar interface for data processing tasks, while Ray offers more flexibility for custom parallel computations.

40,184

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

  • Mature ecosystem with extensive libraries and integrations
  • Robust support for SQL and structured data processing
  • Strong community and enterprise adoption

Cons of Spark

  • Steeper learning curve for beginners
  • Higher resource consumption for small-scale tasks
  • Less flexible for general-purpose distributed computing

Code Comparison

Spark:

val df = spark.read.json("data.json")
df.groupBy("category").agg(avg("price")).show()

Ray:

@ray.remote
def process_data(data):
    # Custom data processing logic
    return result

results = ray.get([process_data.remote(chunk) for chunk in data_chunks])

Key Differences

  • Spark focuses on big data processing and analytics, while Ray is more versatile for distributed computing tasks
  • Ray offers easier scaling for machine learning and AI workloads
  • Spark has better built-in support for SQL and DataFrames, while Ray provides more flexibility for custom distributed algorithms

Use Cases

  • Choose Spark for large-scale data processing, ETL, and SQL-based analytics
  • Opt for Ray when building distributed applications, reinforcement learning systems, or scaling Python code

Performance Considerations

  • Spark excels at batch processing of large datasets
  • Ray offers lower latency for fine-grained tasks and better support for GPU acceleration
14,221

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Pros of Horovod

  • Specialized for distributed deep learning, particularly with TensorFlow, PyTorch, and MXNet
  • Efficient communication using ring-allreduce algorithm
  • Easier to scale existing single-GPU training scripts to multiple GPUs or nodes

Cons of Horovod

  • Limited to deep learning workloads, less versatile for general distributed computing
  • Requires more manual configuration for complex distributed setups
  • Less extensive ecosystem and fewer built-in features compared to Ray

Code Comparison

Horovod:

import horovod.tensorflow as hvd
hvd.init()
optimizer = tf.optimizers.Adam(0.001 * hvd.size())
optimizer = hvd.DistributedOptimizer(optimizer)

Ray:

import ray
from ray import train
@ray.remote
def train_func():
    model.fit(train_data)
results = ray.get([train_func.remote() for _ in range(num_workers)])

Summary

Horovod excels in distributed deep learning scenarios, offering efficient communication and easy scaling of existing scripts. Ray, on the other hand, provides a more versatile distributed computing framework suitable for various tasks beyond deep learning. While Horovod focuses on optimizing specific deep learning workflows, Ray offers a broader ecosystem with more built-in features for general distributed computing needs.

35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • Specialized for deep learning model training and inference optimization
  • Offers advanced memory optimization techniques like ZeRO (Zero Redundancy Optimizer)
  • Tightly integrated with PyTorch for seamless usage

Cons of DeepSpeed

  • More focused on deep learning, less versatile for general distributed computing
  • Steeper learning curve for users not familiar with PyTorch ecosystem
  • Limited support for other ML frameworks compared to Ray's broader ecosystem

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args, model=model, model_parameters=params)
for step, batch in enumerate(data_loader):
    loss = model_engine(batch)
    model_engine.backward(loss)
    model_engine.step()

Ray:

import ray
@ray.remote
def train_step(model, batch):
    loss = model(batch)
    return loss
results = ray.get([train_step.remote(model, batch) for batch in data_loader])

Both frameworks aim to simplify distributed computing, but DeepSpeed focuses on optimizing deep learning workloads, while Ray offers a more general-purpose distributed computing solution. DeepSpeed provides deeper integration with PyTorch and specialized optimizations for large models, whereas Ray offers greater flexibility across various computing tasks and ML frameworks.

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • More mature and widely adopted in the deep learning community
  • Extensive ecosystem of pre-trained models and libraries
  • Dynamic computational graphs for easier debugging and flexibility

Cons of PyTorch

  • Less suitable for distributed computing and large-scale data processing
  • Limited support for non-deep learning machine learning tasks
  • Steeper learning curve for beginners compared to Ray

Code Comparison

PyTorch example:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)

Ray example:

import ray

@ray.remote
def add(x, y):
    return x + y

result = ray.get(add.remote(1, 2))

PyTorch focuses on tensor operations and neural network building blocks, while Ray emphasizes distributed computing and parallel task execution. PyTorch is primarily used for deep learning tasks, whereas Ray is a more general-purpose framework for scaling Python applications and machine learning workloads.

18,503

Open source platform for the machine learning lifecycle

Pros of MLflow

  • Focused on experiment tracking and model management
  • Easier to integrate with existing ML workflows
  • More comprehensive model versioning and deployment features

Cons of MLflow

  • Less scalable for distributed computing tasks
  • Limited support for parallel and distributed training

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", value1)
mlflow.log_metric("metric1", value2)
mlflow.end_run()

Ray:

import ray

@ray.remote
def task(x):
    return x * x

futures = [task.remote(i) for i in range(4)]
print(ray.get(futures))

MLflow focuses on tracking experiments and managing models, making it easier to log parameters, metrics, and artifacts. Ray, on the other hand, excels at distributed computing and parallel task execution, providing a more scalable solution for large-scale machine learning workloads.

While MLflow offers better model versioning and deployment features, Ray provides a more robust framework for distributed computing and scaling machine learning pipelines. MLflow is generally easier to integrate into existing workflows, but Ray offers more flexibility for complex, distributed tasks.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/ray_header_logo.png

.. image:: https://readthedocs.org/projects/ray/badge/?version=master :target: http://docs.ray.io/en/master/?badge=master

.. image:: https://img.shields.io/badge/Ray-Join%20Slack-blue :target: https://forms.gle/9TSdDYUgxYs8SA9e8

.. image:: https://img.shields.io/badge/Discuss-Ask%20Questions-blue :target: https://discuss.ray.io/

.. image:: https://img.shields.io/twitter/follow/raydistributed.svg?style=social&logo=twitter :target: https://twitter.com/raydistributed

.. image:: https://img.shields.io/badge/Get_started_for_free-3C8AE9?logo=data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8%2F9hAAAAAXNSR0IArs4c6QAAAERlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAEKADAAQAAAABAAAAEAAAAAA0VXHyAAABKElEQVQ4Ea2TvWoCQRRGnWCVWChIIlikC9hpJdikSbGgaONbpAoY8gKBdAGfwkfwKQypLQ1sEGyMYhN1Pd%2B6A8PqwBZeOHt%2FvsvMnd3ZXBRFPQjBZ9K6OY8ZxF%2B0IYw9PW3qz8aY6lk92bZ%2BVqSI3oC9T7%2FyCVnrF1ngj93us%2B540sf5BrCDfw9b6jJ5lx%2FyjtGKBBXc3cnqx0INN4ImbI%2Bl%2BPnI8zWfFEr4chLLrWHCp9OO9j19Kbc91HX0zzzBO8EbLK2Iv4ZvNO3is3h6jb%2BCwO0iL8AaWqB7ILPTxq3kDypqvBuYuwswqo6wgYJbT8XxBPZ8KS1TepkFdC79TAHHce%2F7LbVioi3wEfTpmeKtPRGEeoldSP%2FOeoEftpP4BRbgXrYZefsAI%2BP9JU7ImyEAAAAASUVORK5CYII%3D :target: https://console.anyscale.com/register/ha?utm_source=github&utm_medium=ray_readme&utm_campaign=get_started_badge

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI libraries for simplifying ML compute:

.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/what-is-ray-padded.svg

.. https://docs.google.com/drawings/d/1Pl8aCYOsZCo61cmp57c7Sja6HhIygGCvSZLi_AuBuqo/edit

Learn more about Ray AI Libraries_:

  • Data_: Scalable Datasets for ML
  • Train_: Distributed Training
  • Tune_: Scalable Hyperparameter Tuning
  • RLlib_: Scalable Reinforcement Learning
  • Serve_: Scalable and Programmable Serving

Or more about Ray Core_ and its key abstractions:

  • Tasks_: Stateless functions executed in the cluster.
  • Actors_: Stateful worker processes created in the cluster.
  • Objects_: Immutable values accessible across the cluster.

Learn more about Monitoring and Debugging:

  • Monitor Ray apps and clusters with the Ray Dashboard <https://docs.ray.io/en/latest/ray-core/ray-dashboard.html>__.
  • Debug Ray apps with the Ray Distributed Debugger <https://docs.ray.io/en/latest/ray-observability/ray-distributed-debugger.html>__.

Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations_.

Install Ray with: pip install ray. For nightly wheels, see the Installation page <https://docs.ray.io/en/latest/ray-overview/installation.html>__.

.. _Serve: https://docs.ray.io/en/latest/serve/index.html .. _Data: https://docs.ray.io/en/latest/data/dataset.html .. _Workflow: https://docs.ray.io/en/latest/workflows/concepts.html .. _Train: https://docs.ray.io/en/latest/train/train.html .. _Tune: https://docs.ray.io/en/latest/tune/index.html .. _RLlib: https://docs.ray.io/en/latest/rllib/index.html .. _ecosystem of community integrations: https://docs.ray.io/en/latest/ray-overview/ray-libraries.html

Why Ray?

Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.

Ray is a unified way to scale Python and AI applications from a laptop to a cluster.

With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.

More Information

  • Documentation_
  • Ray Architecture whitepaper_
  • Exoshuffle: large-scale data shuffle in Ray_
  • Ownership: a distributed futures system for fine-grained tasks_
  • RLlib paper_
  • Tune paper_

Older documents:

  • Ray paper_
  • Ray HotOS paper_
  • Ray Architecture v1 whitepaper_

.. _Ray AI Libraries: https://docs.ray.io/en/latest/ray-air/getting-started.html .. _Ray Core: https://docs.ray.io/en/latest/ray-core/walkthrough.html .. _Tasks: https://docs.ray.io/en/latest/ray-core/tasks.html .. _Actors: https://docs.ray.io/en/latest/ray-core/actors.html .. _Objects: https://docs.ray.io/en/latest/ray-core/objects.html .. _Documentation: http://docs.ray.io/en/latest/index.html .. _Ray Architecture v1 whitepaper: https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/preview .. _Ray Architecture whitepaper: https://docs.google.com/document/d/1tBw9A4j62ruI5omIJbMxly-la5w4q_TjyJgJL_jN2fI/preview .. _Exoshuffle: large-scale data shuffle in Ray: https://arxiv.org/abs/2203.05072 .. _Ownership: a distributed futures system for fine-grained tasks: https://www.usenix.org/system/files/nsdi21-wang.pdf .. _Ray paper: https://arxiv.org/abs/1712.05889 .. _Ray HotOS paper: https://arxiv.org/abs/1703.03924 .. _RLlib paper: https://arxiv.org/abs/1712.09381 .. _Tune paper: https://arxiv.org/abs/1807.05118

Getting Involved

.. list-table:: :widths: 25 50 25 25 :header-rows: 1

    • Platform
    • Purpose
    • Estimated Response Time
    • Support Level
    • Discourse Forum_
    • For discussions about development and questions about usage.
    • < 1 day
    • Community
    • GitHub Issues_
    • For reporting bugs and filing feature requests.
    • < 2 days
    • Ray OSS Team
    • Slack_
    • For collaborating with other Ray users.
    • < 2 days
    • Community
    • StackOverflow_
    • For asking questions about how to use Ray.
    • 3-5 days
    • Community
    • Meetup Group_
    • For learning about Ray projects and best practices.
    • Monthly
    • Ray DevRel
    • Twitter_
    • For staying up-to-date on new features.
    • Daily
    • Ray DevRel

.. _Discourse Forum: https://discuss.ray.io/ .. _GitHub Issues: https://github.com/ray-project/ray/issues .. _StackOverflow: https://stackoverflow.com/questions/tagged/ray .. _Meetup Group: https://www.meetup.com/Bay-Area-Ray-Meetup/ .. _Twitter: https://twitter.com/raydistributed .. _Slack: https://www.ray.io/join-slack?utm_source=github&utm_medium=ray_readme&utm_campaign=getting_involved