Convert Figma logo to code with AI

deepseek-ai logoopen-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,957
232
6,957
0

Top Related Projects

The official Python client for the Huggingface Hub.

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

188,828

An Open Source Machine Learning Framework for Everyone

18,872

Open standard for machine learning interoperability

20,329

Open source platform for the machine learning lifecycle

Quick Overview

The deepseek-ai/open-infra-index repository is a project aimed at creating an open-source index for AI infrastructure. It provides a comprehensive list of AI infrastructure projects, tools, and resources, categorized and curated for easy access and reference by the AI community.

Pros

  • Centralized resource for AI infrastructure information
  • Open-source and community-driven, allowing for continuous updates and contributions
  • Well-organized categorization of AI tools and projects
  • Helps developers and researchers discover relevant AI infrastructure solutions

Cons

  • May require regular maintenance to keep information up-to-date
  • Potential for bias in project selection or categorization
  • Limited to projects that are publicly available or open-source
  • Might not cover all niche or specialized AI infrastructure tools

As this is not a code library, we'll skip the code examples and getting started instructions sections.

Competitor Comparisons

The official Python client for the Huggingface Hub.

Pros of huggingface_hub

  • Extensive documentation and examples for easy integration
  • Large community support and active development
  • Seamless integration with popular machine learning frameworks

Cons of huggingface_hub

  • Focused primarily on machine learning models, limiting versatility
  • Potential for slower performance due to its broad scope

Code Comparison

huggingface_hub:

from huggingface_hub import HfApi, Repository

api = HfApi()
repo = Repository("path/to/local/folder", clone_from="username/repo-name")
repo.git_pull()

open-infra-index:

from open_infra_index import OpenInfraIndex

index = OpenInfraIndex()
results = index.search("query")
print(results)

The huggingface_hub code demonstrates repository management and interaction, while open-infra-index focuses on searching and retrieving infrastructure-related information. huggingface_hub provides more comprehensive tools for model management and sharing, whereas open-infra-index appears to be tailored for infrastructure indexing and search functionality.

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • More mature and widely adopted project with extensive documentation
  • Offers a comprehensive suite of optimization techniques for deep learning
  • Supports distributed training across multiple GPUs and nodes

Cons of DeepSpeed

  • Steeper learning curve due to its extensive feature set
  • Primarily focused on PyTorch, limiting its use with other frameworks
  • Requires more configuration and setup compared to simpler alternatives

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

open-infra-index: No relevant code comparison available, as open-infra-index is not a deep learning optimization library but rather an index of open-source infrastructure projects.

Summary

DeepSpeed is a powerful deep learning optimization library, offering advanced features for training large models efficiently. It excels in distributed training and provides various optimization techniques. However, it may be more complex to set up and use compared to simpler alternatives.

open-infra-index, on the other hand, serves a different purpose as an index of open-source infrastructure projects. It doesn't provide direct functionality for deep learning optimization, making a direct comparison with DeepSpeed less relevant in terms of technical features and code usage.

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • Mature, widely-used deep learning framework with extensive documentation and community support
  • Offers dynamic computational graphs, making it more flexible for complex model architectures
  • Provides a rich ecosystem of tools and libraries for various AI/ML tasks

Cons of PyTorch

  • Larger codebase and more complex setup compared to Open-Infra-Index
  • Steeper learning curve for beginners due to its comprehensive feature set
  • May have higher resource requirements for basic tasks

Code Comparison

PyTorch example (tensor creation and operation):

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
print(z)

Open-Infra-Index doesn't have directly comparable code as it's an index for open-source AI infrastructure projects rather than a deep learning framework. Its primary function is to provide information and links to various AI-related repositories.

Summary

PyTorch is a powerful deep learning framework suitable for complex AI/ML tasks, while Open-Infra-Index serves as a curated list of open-source AI infrastructure projects. PyTorch offers more functionality but requires more resources and expertise, whereas Open-Infra-Index provides a simpler way to discover and access various AI tools and frameworks.

188,828

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Extensive ecosystem with robust tools and libraries
  • Strong community support and extensive documentation
  • Widely adopted in industry and research

Cons of TensorFlow

  • Steeper learning curve for beginners
  • Can be slower for prototyping compared to some alternatives
  • Large framework size may be overkill for simpler projects

Code Comparison

TensorFlow example:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

Open-infra-index doesn't have comparable code as it's an index repository, not a machine learning framework.

Summary

TensorFlow is a comprehensive machine learning framework with a vast ecosystem and strong community support. It's widely used in industry and research but can have a steeper learning curve. Open-infra-index, on the other hand, is an index repository for open infrastructure projects and doesn't provide direct machine learning functionality. The choice between them depends on whether you need a machine learning framework (TensorFlow) or are looking for information on open infrastructure projects (Open-infra-index).

18,872

Open standard for machine learning interoperability

Pros of ONNX

  • Widely adopted standard for machine learning interoperability
  • Extensive ecosystem with support for multiple frameworks and hardware
  • Comprehensive documentation and community support

Cons of ONNX

  • More complex to use for specific infrastructure-related tasks
  • Focuses primarily on machine learning models, not general infrastructure

Code Comparison

ONNX example (model definition):

import onnx

node = onnx.helper.make_node(
    'Relu',
    inputs=['x'],
    outputs=['y'],
)
graph = onnx.helper.make_graph([node], 'test-model', [], [])
model = onnx.helper.make_model(graph)

Open-infra-index example (infrastructure metrics):

from open_infra_index import InfraIndex

index = InfraIndex()
metrics = index.get_metrics('aws', 'ec2')
print(metrics['performance'])

While ONNX focuses on defining and exchanging machine learning models, Open-infra-index is tailored for infrastructure-related metrics and comparisons. ONNX provides a standardized format for ML models across different frameworks, whereas Open-infra-index offers a way to analyze and compare cloud infrastructure services. The choice between these repositories depends on the specific use case: ONNX for machine learning interoperability or Open-infra-index for infrastructure analysis and decision-making.

20,329

Open source platform for the machine learning lifecycle

Pros of MLflow

  • More mature and widely adopted project with extensive documentation
  • Comprehensive end-to-end ML lifecycle management capabilities
  • Strong integration with popular ML frameworks and cloud platforms

Cons of MLflow

  • Steeper learning curve for beginners due to its extensive feature set
  • Requires more setup and configuration for full functionality
  • May be overkill for smaller projects or simpler ML workflows

Code Comparison

MLflow example:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.95)
mlflow.end_run()

open-infra-index doesn't have comparable code as it's an index/database project, not an ML platform.

Additional Notes

MLflow is a comprehensive ML lifecycle management platform, while open-infra-index is a database of open-source AI infrastructure projects. They serve different purposes and aren't directly comparable in terms of functionality.

MLflow offers features like experiment tracking, model versioning, and deployment, making it suitable for data scientists and ML engineers working on various ML projects.

open-infra-index, on the other hand, provides a curated list of AI infrastructure projects, which can be useful for developers looking to explore and integrate different tools into their AI/ML workflows.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

DeepSeek-Open-Infra

Hello, DeepSeek Open Infra!

202502 Open-Source Week

We're a tiny team @deepseek-ai pushing our limits in AGI exploration.

Starting this week , Feb 24, 2025 we'll open-source 5 repos – one daily drop – not because we've made grand claims, but simply as developers sharing our small-but-sincere progress with full transparency.

These are humble building blocks of our online service: documented, deployed, and battle-tested in production. No vaporware, just sincere code that moved our tiny yet ambitious dream forward.

Why? Because every line shared becomes collective momentum that accelerates the journey. Daily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation 🔧

Stay tuned – let's geek out in the open together.

Day 1 - FlashMLA

Efficient MLA Decoding Kernel for Hopper GPUs
Optimized for variable-length sequences, battle-tested in production

🔗 FlashMLA GitHub Repo
✅ BF16 support
✅ Paged KV cache (block size 64)
⚡ Performance: 3000 GB/s memory-bound | BF16 580 TFLOPS compute-bound on H800

Day 2 - DeepEP

Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

🔗 DeepEP GitHub Repo
✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping

Day 3 - DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

🔗 DeepGEMM GitHub Repo
⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts

Day 4 - Optimized Parallelism Strategies

✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
🔗 GitHub Repo

✅ EPLB - an expert-parallel load balancer for V3/R1.
🔗 GitHub Repo

📊 Analyze computation-communication overlap in V3/R1.
🔗 GitHub Repo

Day 5 - 3FS, Thruster for All DeepSeek Data Access

Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.

⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster
⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
⚡ 40+ GiB/s peak throughput per client node for KVCache lookup
🧬 Disaggregated architecture with strong consistency semantics
✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1

📥 3FS → 🔗GitHub Repo
⛲ Smallpond - data processing framework on 3FS → 🔗GitHub Repo

Day 6 - One More Thing: DeepSeek-V3/R1 Inference System Overview

Optimized throughput and latency via:
🔧 Cross-node EP-powered batch scaling
🔄 Computation-communication overlap
⚖️ Load balancing

Production data of V3/R1 online services:
⚡ 73.7k/14.8k input/output tokens per second per H800 node
🚀 Cost profit margin 545%

Cost And Theoretical Income.jpg

💡 We hope this week's insights offer value to the community and contribute to our shared AGI goals.

📖 Deep Dive: 🔗Day 6 - One More Thing: DeepSeek-V3/R1 Inference System Overview
📖 中文版: 🔗DeepSeek-V3 / R1 推理系统概览

2024 AI Infrastructure Paper (SC24)

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

📄 Paper Link
📄 Arxiv Paper Link