Top Related Projects
Development repository for the Triton language and compiler
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
An Open Source Machine Learning Framework for Everyone
A library for efficient similarity search and clustering of dense vectors.
Quick Overview
GGML (Georgi Gerganov Machine Learning) is a tensor library for machine learning, focusing on efficient CPU and GPU inference. It's designed to be lightweight, portable, and easy to integrate into existing projects, with a particular emphasis on running large language models on consumer hardware.
Pros
- Highly optimized for CPU and GPU performance
- Supports quantization for reduced memory usage and faster inference
- Easy to integrate into existing C/C++ projects
- Actively maintained and rapidly evolving
Cons
- Limited documentation and examples compared to more established libraries
- Primarily focused on inference, not training
- Steeper learning curve for those not familiar with low-level C programming
- May require manual optimization for specific hardware configurations
Code Examples
- Creating and manipulating tensors:
#include "ggml.h"
int main() {
struct ggml_context * ctx = ggml_init(ggml_mem_size(1024*1024*1024));
struct ggml_tensor * a = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 2, 3);
struct ggml_tensor * b = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 2);
struct ggml_tensor * c = ggml_mul_mat(ctx, a, b);
ggml_free(ctx);
return 0;
}
- Performing inference with a pre-trained model:
#include "ggml.h"
int main() {
struct ggml_context * ctx = ggml_init(ggml_mem_size(1024*1024*1024));
struct ggml_tensor * input = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 768);
struct ggml_tensor * weights = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 768, 512);
struct ggml_tensor * output = ggml_mul_mat(ctx, weights, input);
ggml_compute_forward(output);
ggml_free(ctx);
return 0;
}
- Using quantization for reduced memory usage:
#include "ggml.h"
int main() {
struct ggml_context * ctx = ggml_init(ggml_mem_size(1024*1024*1024));
struct ggml_tensor * a = ggml_new_tensor_1d(ctx, GGML_TYPE_Q4_0, 1024);
struct ggml_tensor * b = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1024);
struct ggml_tensor * c = ggml_add(ctx, ggml_dequantize(ctx, a), b);
ggml_free(ctx);
return 0;
}
Getting Started
To use GGML in your project:
-
Clone the repository:
git clone https://github.com/ggerganov/ggml.git
-
Include GGML in your project:
#include "ggml.h"
-
Compile your project with GGML:
gcc -I./ggml/include -c your_file.c gcc -o your_program your_file.o ./ggml/build/libggml.a -lm
-
Run your program:
./your_program
Competitor Comparisons
Development repository for the Triton language and compiler
Pros of Triton
- Designed for GPU programming with a focus on high-performance computing
- Offers a Python-based domain-specific language for easier GPU kernel development
- Provides automatic optimization and code generation for different GPU architectures
Cons of Triton
- Steeper learning curve due to its specialized nature and GPU-specific concepts
- More limited in scope, primarily focused on GPU acceleration rather than general-purpose machine learning
Code Comparison
GGML example (matrix multiplication):
void ggml_mul_mat(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst) {
// Implementation details
}
Triton example (matrix multiplication):
@triton.jit
def matmul_kernel(a_ptr, b_ptr, c_ptr, M, N, K, BLOCK_SIZE: tl.constexpr):
# Kernel implementation details
Summary
GGML is a general-purpose machine learning library with a focus on efficiency and portability, while Triton is specifically designed for GPU programming and optimization. GGML offers broader applicability across different hardware, whereas Triton excels in GPU-specific tasks and provides a more accessible way to write high-performance GPU kernels.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Optimized for distributed training across multiple GPUs and nodes
- Supports a wide range of AI models and frameworks (PyTorch, TensorFlow, etc.)
- Offers advanced features like ZeRO optimizer and pipeline parallelism
Cons of DeepSpeed
- More complex setup and configuration compared to GGML
- Primarily focused on large-scale training, may be overkill for smaller projects
- Steeper learning curve for beginners
Code Comparison
GGML (simple model loading):
struct ggml_context * ctx = ggml_init(params);
struct ggml_tensor * input = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 28, 28);
struct ggml_tensor * output = model_eval(ctx, input);
DeepSpeed (model initialization with ZeRO):
model_engine, optimizer, _, _ = deepspeed.initialize(
args=args,
model=model,
model_parameters=model.parameters(),
config=ds_config
)
Both libraries aim to optimize AI model performance, but DeepSpeed focuses on distributed training for large-scale models, while GGML emphasizes efficiency for smaller devices and inference. DeepSpeed offers more advanced features but requires more setup, whereas GGML provides a simpler interface for quick integration in C/C++ projects.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Extensive ecosystem with a wide range of pre-built models and tools
- Strong community support and frequent updates
- Seamless integration with CUDA for GPU acceleration
Cons of PyTorch
- Larger memory footprint and slower inference times
- Steeper learning curve for beginners
- More complex setup and deployment process
Code Comparison
PyTorch example:
import torch
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.matmul(x, y)
GGML example:
#include "ggml.h"
struct ggml_tensor * x = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3);
struct ggml_tensor * y = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3);
struct ggml_tensor * z = ggml_mul_mat(ctx, x, y);
Summary
PyTorch offers a comprehensive deep learning framework with extensive features and community support, while GGML focuses on efficient inference for smaller models. PyTorch excels in research and large-scale projects, whereas GGML is better suited for lightweight applications and embedded systems. The choice between the two depends on the specific requirements of your project, such as model size, deployment environment, and performance constraints.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Pros of JAX
- Supports automatic differentiation and GPU/TPU acceleration
- Offers a more comprehensive machine learning ecosystem
- Provides better integration with popular ML frameworks like TensorFlow and PyTorch
Cons of JAX
- Higher learning curve and complexity compared to GGML
- Larger codebase and potentially slower compilation times
- May be overkill for simpler machine learning tasks
Code Comparison
GGML (simple matrix multiplication):
struct ggml_tensor * result = ggml_mul_mat(ctx, A, B);
JAX (simple matrix multiplication):
import jax.numpy as jnp
result = jnp.dot(A, B)
GGML focuses on simplicity and efficiency for specific tasks, while JAX offers a more comprehensive set of tools for machine learning and scientific computing. GGML is written in C and designed for performance, especially in resource-constrained environments. JAX, on the other hand, is Python-based and provides a wider range of features, including automatic differentiation and hardware acceleration.
GGML is better suited for projects requiring lightweight, efficient implementations of specific ML algorithms, particularly in embedded systems or applications with limited resources. JAX is more appropriate for complex machine learning projects, research, and applications that benefit from its extensive ecosystem and integration with other ML frameworks.
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Extensive ecosystem with robust tools, libraries, and community support
- Highly scalable for large-scale machine learning and deep learning projects
- Supports multiple programming languages and platforms
Cons of TensorFlow
- Steeper learning curve, especially for beginners
- Can be resource-intensive and slower for smaller projects
- More complex setup and configuration process
Code Comparison
GGML (simple matrix multiplication):
struct ggml_tensor * result = ggml_mul_mat(ctx, A, B);
TensorFlow (equivalent operation):
import tensorflow as tf
result = tf.matmul(A, B)
GGML focuses on simplicity and efficiency for smaller-scale projects, particularly in C/C++ environments. It's designed for lightweight implementations and easy integration into existing codebases.
TensorFlow, on the other hand, offers a comprehensive framework for building and deploying machine learning models at scale. It provides a wide range of tools and libraries for various ML tasks, making it suitable for complex, large-scale projects across different domains.
While GGML excels in simplicity and performance for specific use cases, TensorFlow's versatility and extensive ecosystem make it a popular choice for diverse machine learning applications.
A library for efficient similarity search and clustering of dense vectors.
Pros of Faiss
- Highly optimized for similarity search and clustering of dense vectors
- Supports GPU acceleration for faster processing
- Extensive documentation and well-established community support
Cons of Faiss
- Primarily focused on vector similarity search, less versatile for general machine learning tasks
- Steeper learning curve due to its specialized nature
- Larger codebase and dependencies compared to GGML
Code Comparison
GGML (matrix multiplication):
void ggml_mul_mat(
const struct ggml_tensor * src0,
const struct ggml_tensor * src1,
struct ggml_tensor * dst) {
// Implementation details
}
Faiss (vector indexing):
faiss::IndexFlatL2 index(d);
index.add(n, xb);
index.search(nq, xq, k, distances, labels);
GGML focuses on efficient tensor operations for machine learning, while Faiss specializes in vector similarity search and clustering. GGML's codebase is more compact and versatile, suitable for various ML tasks. Faiss excels in high-performance vector operations, particularly for large-scale similarity search applications. The code snippets highlight their different focuses: GGML on matrix operations and Faiss on vector indexing and search.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ggml
Tensor library for machine learning
Note that this project is under active development.
Some of the development is currently happening in the llama.cpp and whisper.cpp repos
Features
- Low-level cross-platform implementation
- Integer quantization support
- Broad hardware support
- Automatic differentiation
- ADAM and L-BFGS optimizers
- No third-party dependencies
- Zero memory allocations during runtime
Build
git clone https://github.com/ggml-org/ggml
cd ggml
# install python dependencies in a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# build the examples
mkdir build && cd build
cmake ..
cmake --build . --config Release -j 8
GPT inference (example)
# run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
For more information, checkout the corresponding programs in the examples folder.
Using CUDA
# fix the path to point to your CUDA compiler
cmake -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc ..
Using hipBLAS
cmake -DCMAKE_C_COMPILER="$(hipconfig -l)/clang" -DCMAKE_CXX_COMPILER="$(hipconfig -l)/clang++" -DGGML_HIP=ON
Using SYCL
# linux
source /opt/intel/oneapi/setvars.sh
cmake -G "Ninja" -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL=ON ..
# windows
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
cmake -G "Ninja" -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPILER=icx -DGGML_SYCL=ON ..
Compiling for Android
Download and unzip the NDK from this download page. Set the NDK_ROOT_PATH environment variable or provide the absolute path to the CMAKE_ANDROID_NDK in the command below.
cmake .. \
-DCMAKE_SYSTEM_NAME=Android \
-DCMAKE_SYSTEM_VERSION=33 \
-DCMAKE_ANDROID_ARCH_ABI=arm64-v8a \
-DCMAKE_ANDROID_NDK=$NDK_ROOT_PATH
-DCMAKE_ANDROID_STL_TYPE=c++_shared
# create directories
adb shell 'mkdir /data/local/tmp/bin'
adb shell 'mkdir /data/local/tmp/models'
# push the compiled binaries to the folder
adb push bin/* /data/local/tmp/bin/
# push the ggml library
adb push src/libggml.so /data/local/tmp/
# push model files
adb push models/gpt-2-117M/ggml-model.bin /data/local/tmp/models/
adb shell
cd /data/local/tmp
export LD_LIBRARY_PATH=/data/local/tmp
./bin/gpt-2-backend -m models/ggml-model.bin -p "this is an example"
Resources
Top Related Projects
Development repository for the Triton language and compiler
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
An Open Source Machine Learning Framework for Everyone
A library for efficient similarity search and clustering of dense vectors.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot