Top Related Projects
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An Open Source Machine Learning Framework for Everyone
NumPy aware dynamic Python compiler using LLVM
ArrayFire: a general purpose GPU library.
a language for fast, portable data-parallel computation
Quick Overview
Taichi is an open-source, high-performance programming language for computer graphics, computational physics, and AI. It aims to provide a portable, high-performance programming experience for developers working on complex numerical computations and simulations.
Pros
- High performance: Taichi offers efficient compilation and execution on various hardware platforms, including CPUs and GPUs.
- Easy to learn: The language is designed to be intuitive for Python programmers, with a syntax similar to Python.
- Versatile: Taichi supports a wide range of applications, from computer graphics to scientific computing.
- Cross-platform: It can run on multiple operating systems and hardware architectures.
Cons
- Limited ecosystem: Compared to more established languages, Taichi has a smaller community and fewer third-party libraries.
- Learning curve: While easy for Python developers, it may require some adjustment for those coming from other programming backgrounds.
- Documentation: Although improving, the documentation can be sparse in some areas, especially for advanced features.
Code Examples
- Basic vector computation:
import taichi as ti
ti.init(arch=ti.cpu)
n = 320
pixels = ti.field(dtype=float, shape=(n * 2, n))
@ti.kernel
def paint(t: float):
for i, j in pixels:
pixels[i, j] = i * 0.001 + j * 0.002 + t
gui = ti.GUI("Taichi Example", res=(n * 2, n))
for i in range(1000000):
paint(i * 0.001)
gui.set_image(pixels)
gui.show()
This example creates a simple animation using Taichi's field and kernel concepts.
- Particle simulation:
import taichi as ti
ti.init(arch=ti.gpu)
n_particles = 8192
n_grid = 128
dt = 2e-4
particles = ti.Vector.field(2, dtype=float, shape=n_particles)
grid_v = ti.Vector.field(2, dtype=float, shape=(n_grid, n_grid))
grid_m = ti.field(dtype=float, shape=(n_grid, n_grid))
@ti.kernel
def substep():
for p in particles:
base = (particles[p] * n_grid).cast(int)
fx = particles[p] * n_grid - base.cast(float)
w = [0.5 * (1.5 - fx) ** 2, 0.75 - (fx - 1) ** 2, 0.5 * (fx - 0.5) ** 2]
for i in ti.static(range(3)):
for j in ti.static(range(3)):
offset = ti.Vector([i, j])
weight = w[i][0] * w[j][1]
grid_v[base + offset] += weight * particles[p]
grid_m[base + offset] += weight
for i, j in grid_m:
if grid_m[i, j] > 0:
grid_v[i, j] /= grid_m[i, j]
grid_v[i, j] *= ti.exp(-dt * 10)
grid_v[i, j].y -= dt * 200
for p in particles:
base = (particles[p] * n_grid).cast(int)
fx = particles[p] * n_grid - base.cast(float)
w = [0.5 * (1.5 - fx) ** 2, 0.75 - (fx - 1) ** 2, 0.5 * (fx - 0.5) ** 2]
new_v = ti.Vector.zero(float, 2)
for i in ti.static(range(3)):
for j in ti.static(range(3)):
offset = ti.Vector([i, j])
weight = w[i][0] * w[j][1]
new_v += weight * grid_v[base + offset]
particles[p] += new_v *
Competitor Comparisons
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Pros of cuda-samples
- Provides official, optimized examples directly from NVIDIA
- Covers a wide range of CUDA programming techniques and applications
- Excellent resource for learning CUDA and GPU programming best practices
Cons of cuda-samples
- Focused solely on CUDA, limiting portability to non-NVIDIA hardware
- Requires more low-level programming knowledge compared to Taichi
- Less suitable for rapid prototyping or high-level abstractions
Code Comparison
Taichi:
import taichi as ti
ti.init(arch=ti.gpu)
x = ti.field(dtype=ti.f32, shape=(10000,))
@ti.kernel
def compute():
for i in x:
x[i] = ti.sin(i * 0.1)
cuda-samples:
__global__ void compute(float *x, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) {
x[i] = sinf(i * 0.1f);
}
}
// Kernel launch: compute<<<blocks, threads>>>(d_x, N);
This comparison highlights the higher-level abstraction provided by Taichi, which automatically handles memory management and kernel launching, while cuda-samples offers more fine-grained control over GPU resources and execution.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Larger community and ecosystem, with more resources and third-party libraries
- Better support for deep learning and neural networks
- More mature and stable, with a longer development history
Cons of PyTorch
- Higher memory usage and slower compilation times
- Less efficient for certain types of parallel computing tasks
- Steeper learning curve for beginners
Code Comparison
PyTorch:
import torch
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
print(z)
Taichi:
import taichi as ti
ti.init()
x = ti.field(ti.i32, 3)
y = ti.field(ti.i32, 3)
z = ti.field(ti.i32, 3)
@ti.kernel
def add():
for i in range(3):
z[i] = x[i] + y[i]
add()
print(z)
The PyTorch example is more concise and intuitive for simple operations, while Taichi requires more setup but offers finer control over memory and computation. Taichi's approach is particularly beneficial for complex simulations and high-performance computing tasks.
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Extensive ecosystem with robust tools and libraries
- Strong support for distributed computing and large-scale deployments
- Comprehensive documentation and large community support
Cons of TensorFlow
- Steeper learning curve, especially for beginners
- Can be slower for prototyping compared to more dynamic frameworks
- Larger memory footprint and slower startup times
Code Comparison
Taichi:
import taichi as ti
ti.init()
x = ti.field(ti.f32, shape=5)
@ti.kernel
def compute():
for i in x:
x[i] = i * 2
TensorFlow:
import tensorflow as tf
x = tf.Variable([0, 1, 2, 3, 4], dtype=tf.float32)
y = x * 2
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
result = sess.run(y)
Both examples demonstrate basic array operations, but Taichi's syntax is more concise and closer to native Python. TensorFlow requires explicit session management and variable initialization, which can be more verbose for simple operations.
NumPy aware dynamic Python compiler using LLVM
Pros of Numba
- Seamless integration with NumPy and Python ecosystem
- Supports a wider range of Python features and data types
- More mature and established project with extensive documentation
Cons of Numba
- Limited support for GPU acceleration compared to Taichi
- Performance gains may be less significant for complex algorithms
- Requires careful attention to supported features for optimal performance
Code Comparison
Taichi:
import taichi as ti
ti.init(arch=ti.gpu)
@ti.kernel
def saxpy(a: float, x: ti.template(), y: ti.template()):
for i in x:
y[i] += a * x[i]
Numba:
from numba import cuda
@cuda.jit
def saxpy(a, x, y):
i = cuda.grid(1)
if i < x.shape[0]:
y[i] += a * x[i]
Both Taichi and Numba aim to accelerate Python code, but they have different approaches and strengths. Taichi focuses on high-performance computing for physical simulations and computer graphics, while Numba provides a more general-purpose solution for accelerating numerical Python code. Taichi offers better GPU utilization and a unified programming model across different architectures, whereas Numba excels in its compatibility with existing Python and NumPy code.
ArrayFire: a general purpose GPU library.
Pros of ArrayFire
- Mature library with extensive documentation and examples
- Supports multiple backends (CPU, CUDA, OpenCL) for flexibility
- Offers a wide range of pre-built functions for scientific computing
Cons of ArrayFire
- Less focus on high-performance computing for graphics and simulation
- Steeper learning curve for beginners compared to Taichi's simplicity
- Limited support for custom hardware architectures
Code Comparison
Taichi:
import taichi as ti
ti.init(arch=ti.gpu)
x = ti.field(float, shape=5)
@ti.kernel
def compute():
for i in x:
x[i] = i * 2
ArrayFire:
#include <arrayfire.h>
using namespace af;
array x = range(5);
x = x * 2;
Both libraries aim to simplify parallel computing, but Taichi focuses on ease of use and automatic optimization for various architectures, while ArrayFire provides a more traditional array programming interface with multiple backend support. Taichi's Python-based syntax may be more approachable for some users, while ArrayFire's C++ implementation might offer better performance in certain scenarios.
a language for fast, portable data-parallel computation
Pros of Halide
- More mature and established project with a longer history
- Stronger focus on image processing and computational photography
- Better support for GPU acceleration across multiple platforms
Cons of Halide
- Steeper learning curve due to its domain-specific language
- Less flexible for general-purpose computing tasks
- More complex setup and compilation process
Code Comparison
Halide:
Func blur_3x3(Func input) {
Func blur_x, blur_y;
Var x, y, xi, yi;
blur_x(x, y) = (input(x-1, y) + input(x, y) + input(x+1, y)) / 3;
blur_y(x, y) = (blur_x(x, y-1) + blur_x(x, y) + blur_x(x, y+1)) / 3;
blur_y.tile(x, y, xi, yi, 256, 32)
.vectorize(xi, 8).parallel(y);
return blur_y;
}
Taichi:
@ti.kernel
def blur_3x3(input: ti.template(), output: ti.template()):
for i, j in output:
sum = 0.0
for di in range(-1, 2):
for dj in range(-1, 2):
sum += input[i + di, j + dj]
output[i, j] = sum / 9.0
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME

pip install taichi # Install Taichi Lang
ti gallery # Launch demo gallery
What is Taichi Lang?
Taichi Lang is an open-source, imperative, parallel programming language for high-performance numerical computation. It is embedded in Python and uses just-in-time (JIT) compiler frameworks, for example LLVM, to offload the compute-intensive Python code to the native GPU or CPU instructions.
The language has broad applications spanning real-time physical simulation, numerical computation, augmented reality, artificial intelligence, vision and robotics, visual effects in films and games, general-purpose computing, and much more.
Why Taichi Lang?
- Built around Python: Taichi Lang shares almost the same syntax with Python, allowing you to write algorithms with minimal language barrier. It is also well integrated into the Python ecosystem, including NumPy and PyTorch.
- Flexibility: Taichi Lang provides a set of generic data containers known as SNode (/ËsnoÊd/), an effective mechanism for composing hierarchical, multi-dimensional fields. This can cover many use patterns in numerical simulation (e.g. spatially sparse computing).
- Performance: With the
@ti.kernel
decorator, Taichi Lang's JIT compiler automatically compiles your Python functions into efficient GPU or CPU machine code for parallel execution. - Portability: Write your code once and run it everywhere. Currently, Taichi Lang supports most mainstream GPU APIs, such as CUDA and Vulkan.
- ... and many more features! A cross-platform, Vulkan-based 3D visualizer, differentiable programming, quantized computation (experimental), etc.
Getting Started
Installation
Prerequisites
- Operating systems
- Windows
- Linux
- macOS
- Python: 3.6 ~ 3.10 (64-bit only)
- Compute backends
- x64/ARM CPUs
- CUDA
- Vulkan
- OpenGL (4.3+)
- Apple Metal
- WebAssembly (experiemental)
Use Python's package installer pip to install Taichi Lang:
pip install --upgrade taichi
We also provide a nightly package. Note that nightly packages may crash because they are not fully tested. We cannot guarantee their validity, and you are at your own risk trying out our latest, untested features. The nightly packages can be installed from our self-hosted PyPI (Using self-hosted PyPI allows us to provide more frequent releases over a longer period of time)
pip install -i https://pypi.taichi.graphics/simple/ taichi-nightly
Run your "Hello, world!"
Here is how you can program a 2D fractal in Taichi:
# python/taichi/examples/simulation/fractal.py
import taichi as ti
ti.init(arch=ti.gpu)
n = 320
pixels = ti.field(dtype=float, shape=(n * 2, n))
@ti.func
def complex_sqr(z):
return ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2])
@ti.kernel
def paint(t: float):
for i, j in pixels: # Parallelized over all pixels
c = ti.Vector([-0.8, ti.cos(t) * 0.2])
z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
iterations = 0
while z.norm() < 20 and iterations < 50:
z = complex_sqr(z) + c
iterations += 1
pixels[i, j] = 1 - iterations * 0.02
gui = ti.GUI("Julia Set", res=(n * 2, n))
for i in range(1000000):
paint(i * 0.03)
gui.set_image(pixels)
gui.show()
If Taichi Lang is properly installed, you should get the animation below ð:
See Get started for more information.
Build from source
If you wish to try our experimental features or build Taichi Lang for your own environments, see Developer installation.
Documentation
Community activity 
Contributing
Kudos to all of our amazing contributors! Taichi Lang thrives through open-source. In that spirit, we welcome all kinds of contributions from the community. If you would like to participate, check out the Contribution Guidelines first.
Contributor avatars are randomly shuffled.
License
Taichi Lang is distributed under the terms of Apache License (Version 2.0).
See Apache License for details.
Community
For more information about the events or community, please refer to this page
Join our discussions
Report an issue
- If you spot an technical or documentation issue, file an issue at GitHub Issues
- If you spot any security issue, mail directly to security@taichi.graphics.
Contact us
Reference
Demos
- Nerf with Taichi
- Taichi Lang examples
- Advanced Taichi Lang examples
- Awesome Taichi
- DiffTaichi
- Taichi elements
- Taichi Houdini
- More...
AOT deployment
Lectures & talks
- SIGGRAPH 2020 course on Taichi basics: YouTube, Bilibili, slides (pdf).
- Chinagraph 2020 ç¨å¤ªæç¼åç©ç弿: åå©åå©
- GAMES 201 é«çº§ç©ç弿宿æå 2020: 课件
- 太æå¾å½¢è¯¾ç¬¬ä¸å£ï¼è¯¾ä»¶
- TaichiCon: Taichi Developer Conferences
- More to come...
Citations
If you use Taichi Lang in your research, please cite the corresponding papers:
Top Related Projects
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An Open Source Machine Learning Framework for Everyone
NumPy aware dynamic Python compiler using LLVM
ArrayFire: a general purpose GPU library.
a language for fast, portable data-parallel computation
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot