compute

A C++ GPU Computing Library for OpenCL

1,591

336

1,591

157

View on GitHub

Top Related Projects

arrayfire

4,726

ArrayFire: a general purpose GPU library.

hip

3,944

HIP: C++ Heterogeneous-Compute Interface for Portability

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

tensorflow

188,828

An Open Source Machine Learning Framework for Everyone

Quick Overview

Boost.Compute is a C++ GPU computing library based on OpenCL. It provides a high-level, STL-like interface for GPU-accelerated computing, making it easier for developers to harness the power of GPUs for parallel processing tasks.

Pros

Seamless integration with C++ and the STL, allowing for familiar programming patterns
Cross-platform support, working on various GPU architectures
High-level abstractions that simplify GPU programming
Extensive documentation and examples

Cons

Requires OpenCL support, which may not be available on all systems
Performance may not always match hand-optimized OpenCL code
Learning curve for developers new to GPU programming concepts
Limited support for newer GPU features compared to vendor-specific libraries

Code Examples

Vector addition:

#include <boost/compute/algorithm/transform.hpp>
#include <boost/compute/container/vector.hpp>

namespace compute = boost::compute;

compute::vector<float> a(1000, context);
compute::vector<float> b(1000, context);
compute::vector<float> c(1000, context);

compute::transform(
    a.begin(), a.end(),
    b.begin(),
    c.begin(),
    compute::plus<float>(),
    queue
);

Parallel reduction:

#include <boost/compute/algorithm/reduce.hpp>

float sum = compute::reduce(
    vector.begin(),
    vector.end(),
    0.0f,
    compute::plus<float>(),
    queue
);

Custom kernel:

#include <boost/compute/kernel.hpp>

const char source[] = BOOST_COMPUTE_STRINGIZE_SOURCE(
    __kernel void custom_kernel(__global float* input, __global float* output)
    {
        const uint i = get_global_id(0);
        output[i] = input[i] * 2.0f;
    }
);

compute::kernel kernel = compute::kernel::create_with_source(source, "custom_kernel", context);
kernel.set_arg(0, input.get_buffer());
kernel.set_arg(1, output.get_buffer());

queue.enqueue_1d_range_kernel(kernel, 0, input.size(), 0);

Getting Started

Install Boost.Compute:

git clone https://github.com/boostorg/compute.git
cd compute
mkdir build && cd build
cmake ..
make install

Include the necessary headers in your C++ file:

#include <boost/compute/core.hpp>
#include <boost/compute/algorithm/copy.hpp>
#include <boost/compute/container/vector.hpp>

Initialize the Boost.Compute environment:

namespace compute = boost::compute;
compute::device device = compute::system::default_device();
compute::context context(device);
compute::command_queue queue(context, device);

Start using Boost.Compute in your code!

Competitor Comparisons

arrayfire

4,726

ArrayFire: a general purpose GPU library.

Pros of ArrayFire

Supports multiple backends (CUDA, OpenCL, CPU) for greater flexibility
Extensive library of pre-built functions for various domains (linear algebra, signal processing, etc.)
Active development and regular updates

Cons of ArrayFire

Larger codebase and potentially steeper learning curve
May have higher memory usage due to its comprehensive feature set

Code Comparison

ArrayFire:

af::array A = af::randu(5, 5);
af::array B = af::constant(1, 5, 5);
af::array C = af::matmul(A, B);

Compute:

boost::compute::vector<float> A(25, context);
boost::compute::vector<float> B(25, context);
boost::compute::vector<float> C(25, context);
boost::compute::transform(A.begin(), A.end(), B.begin(), C.begin(), 
                          boost::compute::multiplies<float>(), queue);

Key Differences

ArrayFire provides higher-level abstractions and built-in functions
Compute offers more fine-grained control over GPU operations
ArrayFire's syntax is more concise for complex operations
Compute integrates well with other Boost libraries

Use Cases

ArrayFire: Rapid prototyping, scientific computing, machine learning
Compute: Performance-critical applications, custom GPU algorithms, Boost ecosystem integration

hip

3,944

HIP: C++ Heterogeneous-Compute Interface for Portability

Pros of HIP

Supports both AMD and NVIDIA GPUs, offering broader hardware compatibility
More actively maintained with frequent updates and contributions
Provides a CUDA-to-HIP conversion tool for easier migration

Cons of HIP

Steeper learning curve for developers not familiar with CUDA-like programming
Less mature ecosystem compared to more established GPU computing libraries
May have performance overhead in some scenarios due to its abstraction layer

Code Comparison

Compute (C++):

BOOST_COMPUTE_FUNCTION(float, square, (float x),
{
    return x * x;
});

compute::transform(input.begin(), input.end(), output.begin(), square);

HIP (C++):

__global__ void square(float* in, float* out, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) out[idx] = in[idx] * in[idx];
}

hipLaunchKernelGGL(square, dim3(grid), dim3(block), 0, 0, d_in, d_out, N);

Both libraries aim to simplify GPU programming, but HIP focuses on providing a unified interface for multiple GPU architectures, while Compute is more tightly integrated with the Boost ecosystem and C++ templates.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Extensive deep learning framework with a large ecosystem and community support
Dynamic computational graphs for flexible model development
Seamless integration with Python and NumPy

Cons of PyTorch

Steeper learning curve for beginners compared to Compute
Higher resource requirements for installation and usage
Less focus on general-purpose GPU computing

Code Comparison

PyTorch:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
print(z)

Compute:

#include <boost/compute.hpp>
namespace compute = boost::compute;

compute::vector<int> x = {1, 2, 3};
compute::vector<int> y = {4, 5, 6};
compute::vector<int> z(3);
compute::transform(x.begin(), x.end(), y.begin(), z.begin(), compute::plus<int>());

PyTorch offers a more concise and Python-friendly syntax, while Compute provides lower-level control and integration with C++ ecosystems. PyTorch is primarily designed for deep learning tasks, whereas Compute focuses on general-purpose GPU computing across various domains.

tensorflow

188,828

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Extensive ecosystem with high-level APIs and tools for machine learning
Strong support for distributed computing and GPU acceleration
Large community and extensive documentation

Cons of TensorFlow

Steeper learning curve for beginners
Larger footprint and slower compilation times
Less flexible for general-purpose GPU computing tasks

Code Comparison

TensorFlow (Python):

import tensorflow as tf

x = tf.constant([[1], [2], [3], [4]], dtype=tf.float32)
y = tf.constant([[0], [-1], [-2], [-3]], dtype=tf.float32)
linear_model = tf.layers.Dense(units=1)
y_pred = linear_model(x)
loss = tf.losses.mean_squared_error(labels=y, predictions=y_pred)

Boost.Compute (C++):

#include <boost/compute.hpp>
namespace compute = boost::compute;

compute::vector<float> x = {1, 2, 3, 4};
compute::vector<float> y = {0, -1, -2, -3};
compute::transform(x.begin(), x.end(), y.begin(), y.begin(),
                   compute::minus<float>(), queue);

TensorFlow is more suited for complex machine learning tasks, while Boost.Compute provides a lower-level interface for general-purpose GPU computing. TensorFlow offers a higher level of abstraction, making it easier to implement complex models, but Boost.Compute allows for more fine-grained control over GPU operations.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Boost.Compute

Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL.

The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers.

On top of the core library is a generic, STL-like interface providing common algorithms (e.g. transform(), accumulate(), sort()) along with common containers (e.g. vector<T>, flat_set<T>). It also features a number of extensions including parallel-computing algorithms (e.g. exclusive_scan(), scatter(), reduce()) and a number of fancy iterators (e.g. transform_iterator<>, permutation_iterator<>, zip_iterator<>).

The full documentation is available at http://boostorg.github.io/compute/.

Example

The following example shows how to sort a vector of floats on the GPU:

#include <vector>
#include <algorithm>
#include <boost/compute.hpp>

namespace compute = boost::compute;

int main()
{
    // get the default compute device
    compute::device gpu = compute::system::default_device();

    // create a compute context and command queue
    compute::context ctx(gpu);
    compute::command_queue queue(ctx, gpu);

    // generate random numbers on the host
    std::vector<float> host_vector(1000000);
    std::generate(host_vector.begin(), host_vector.end(), rand);

    // create vector on the device
    compute::vector<float> device_vector(1000000, ctx);

    // copy data to the device
    compute::copy(
        host_vector.begin(), host_vector.end(), device_vector.begin(), queue
    );

    // sort data on the device
    compute::sort(
        device_vector.begin(), device_vector.end(), queue
    );

    // copy data back to the host
    compute::copy(
        device_vector.begin(), device_vector.end(), host_vector.begin(), queue
    );

    return 0;
}

Boost.Compute is a header-only library, so no linking is required. The example above can be compiled with:

g++ -I/path/to/compute/include sort.cpp -lOpenCL

More examples can be found in the tutorial and under the examples directory.

Support

Questions about the library (both usage and development) can be posted to the mailing list.

Bugs and feature requests can be reported through the issue tracker.

Also feel free to send me an email with any problems, questions, or feedback.

Help Wanted

The Boost.Compute project is currently looking for additional developers with interest in parallel computing.

Please send an email to Kyle Lutz (kyle.r.lutz@gmail.com) for more information.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot