kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction

2,234

459

2,234

577

View on GitHub

Top Related Projects

thrust

4,974

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

hip

4,100

HIP: C++ Heterogeneous-Compute Interface for Portability

compute

1,591

A C++ GPU Computing Library for OpenCL

Quick Overview

Kokkos is a performance-portable programming model for parallel execution and memory abstraction. It aims to enable developers to write high-performance applications that can run efficiently on diverse hardware architectures, including multi-core CPUs, GPUs, and accelerators, without the need to rewrite code for each platform.

Pros

Performance portability across various hardware architectures
Simplified parallel programming model with a unified interface
Extensive support for different memory spaces and execution policies
Active development and maintenance by a dedicated community

Cons

Steep learning curve for developers new to parallel programming
Potential overhead in some cases compared to platform-specific optimizations
Limited support for certain specialized hardware or niche architectures
Requires careful design considerations to fully leverage its capabilities

Code Examples

Basic parallel for loop:

#include <Kokkos_Core.hpp>

void example_parallel_for() {
  Kokkos::parallel_for(100, KOKKOS_LAMBDA(const int i) {
    // Perform operation on each index
    printf("Index: %d\n", i);
  });
}

Reduction example:

#include <Kokkos_Core.hpp>

void example_reduction() {
  int sum = 0;
  Kokkos::parallel_reduce(100, KOKKOS_LAMBDA(const int i, int& local_sum) {
    local_sum += i;
  }, sum);
  printf("Sum: %d\n", sum);
}

View creation and usage:

#include <Kokkos_Core.hpp>

void example_view() {
  Kokkos::View<double**> matrix("Matrix", 10, 10);
  Kokkos::parallel_for(10, KOKKOS_LAMBDA(const int i) {
    for (int j = 0; j < 10; ++j) {
      matrix(i, j) = i * j;
    }
  });
}

Getting Started

To get started with Kokkos:

Clone the repository:

git clone https://github.com/kokkos/kokkos.git

Configure and build:

cd kokkos
mkdir build && cd build
cmake ..
make

Include Kokkos in your project:

#include <Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc, argv);
  // Your Kokkos code here
  Kokkos::finalize();
  return 0;
}

Compile your project with Kokkos:

g++ -std=c++14 -I/path/to/kokkos/include your_file.cpp -L/path/to/kokkos/lib -lkokkos

Competitor Comparisons

thrust

4,974

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

Pros of Thrust

Highly optimized for NVIDIA GPUs, offering excellent performance on CUDA-enabled devices
Extensive set of parallel algorithms and data structures out-of-the-box
Seamless integration with CUDA and C++ STL

Cons of Thrust

Limited portability across different hardware architectures
Less flexibility for custom memory spaces and execution policies
Steeper learning curve for developers not familiar with CUDA ecosystem

Code Comparison

Thrust:

thrust::device_vector<float> d_vec(1000);
thrust::fill(d_vec.begin(), d_vec.end(), 1.0f);
float sum = thrust::reduce(d_vec.begin(), d_vec.end());

Kokkos:

Kokkos::View<float*> d_vec("d_vec", 1000);
Kokkos::parallel_for(1000, KOKKOS_LAMBDA(const int i) {
  d_vec(i) = 1.0f;
});
float sum = Kokkos::parallel_reduce(1000, KOKKOS_LAMBDA(const int i, float& lsum) {
  lsum += d_vec(i);
}, sum);

Both Thrust and Kokkos aim to simplify parallel programming, but they have different strengths. Thrust excels in NVIDIA GPU environments, offering high performance and a rich set of algorithms. Kokkos provides better portability across various architectures and more flexibility in execution policies, making it suitable for a wider range of HPC applications.

hip

4,100

HIP: C++ Heterogeneous-Compute Interface for Portability

Pros of HIP

Specifically designed for AMD GPUs, offering optimized performance on AMD hardware
Easier transition for CUDA developers due to similar syntax and concepts
Supports both AMD and NVIDIA GPUs, providing more flexibility

Cons of HIP

Less mature ecosystem compared to Kokkos, with fewer tools and libraries
Limited support for non-GPU architectures, whereas Kokkos targets a broader range of platforms
Steeper learning curve for developers not familiar with CUDA-like programming models

Code Comparison

Kokkos:

Kokkos::parallel_for(N, KOKKOS_LAMBDA(const int i) {
  c[i] = a[i] + b[i];
});

HIP:

hipLaunchKernelGGL(vector_add, dim3(grid_size), dim3(block_size), 0, 0, d_a, d_b, d_c, N);

__global__ void vector_add(float *a, float *b, float *c, int n) {
  int i = blockDim.x * blockIdx.x + threadIdx.x;
  if (i < n) c[i] = a[i] + b[i];
}

Both Kokkos and HIP aim to provide portable performance for high-performance computing applications. Kokkos offers a more abstract approach, focusing on performance portability across various architectures, while HIP provides a lower-level, CUDA-like programming model specifically tailored for AMD GPUs with NVIDIA support.

compute

1,591

A C++ GPU Computing Library for OpenCL

Pros of Compute

Part of the Boost ecosystem, benefiting from its extensive documentation and community support
Focuses specifically on GPU computing, offering a more specialized solution for GPU-related tasks
Provides a higher-level abstraction for GPGPU programming, potentially easier for beginners

Cons of Compute

Limited to GPU computing, while Kokkos offers a more versatile approach for various hardware architectures
Less active development and updates compared to Kokkos
Smaller community and fewer resources available for troubleshooting and support

Code Comparison

Compute example:

compute::vector<float> vec(1000);
compute::fill(vec.begin(), vec.end(), 42);
compute::transform(vec.begin(), vec.end(), vec.begin(), compute::_1 * 2);

Kokkos example:

Kokkos::View<float*> vec("vec", 1000);
Kokkos::parallel_for(1000, KOKKOS_LAMBDA(const int i) {
  vec(i) = 42;
});
Kokkos::parallel_for(1000, KOKKOS_LAMBDA(const int i) {
  vec(i) *= 2;
});

Both libraries aim to simplify parallel computing, but Kokkos provides a more general-purpose solution for various architectures, while Compute focuses on GPU-specific optimizations within the Boost framework.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Kokkos: Core Libraries

Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use CUDA, HIP, SYCL, HPX, OpenMP and C++ threads as backend programming models with several other backends in development.

Kokkos Core is part of the Kokkos C++ Performance Portability Programming Ecosystem.

Kokkos is a Linux Foundation project.

Learning about Kokkos

To start learning about Kokkos:

Kokkos Lectures: they contain a mix of lecture videos and hands-on exercises covering all the important capabilities.
Programming guide: contains in "narrative" form a technical description of the programming model, machine model, and the main building blocks like the Views and parallel dispatch.
API reference: organized by category, i.e., core, algorithms and containers or, if you prefer, in alphabetical order.
Use cases and Examples: a serie of examples ranging from how to use Kokkos with MPI to Fortran interoperability.

Obtaining Kokkos

The latest release of Kokkos can be obtained from the GitHub releases page.

The current release is 4.5.01.

curl -OJ -L https://github.com/kokkos/kokkos/releases/download/4.5.01/kokkos-4.5.01.tar.gz
# Or with wget
wget https://github.com/kokkos/kokkos/releases/download/4.5.01/kokkos-4.5.01.tar.gz

To clone the latest development version of Kokkos from GitHub:

git clone -b develop  https://github.com/kokkos/kokkos.git

Building Kokkos

To build Kokkos, you will need to have a C++ compiler that supports C++17 or later. All requirements including minimum and primary tested compiler versions can be found here.

Building and installation instructions are described here.

You can also install Kokkos using Spack: spack install kokkos. Available configuration options can be displayed using spack info kokkos.

For the complete documentation: kokkos.org/kokkos-core-wiki/

Support

For questions find us on Slack: https://kokkosteam.slack.com or open a GitHub issue.

For non-public questions send an email to: crtrott(at)sandia.gov

Contributing

Please see this page for details on how to contribute.

Citing Kokkos

Please see the following page.

License

Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

The full license statement used in all headers is available here or here.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot