compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

1,247

253

1,247

116

View on GitHub

Top Related Projects

DirectML

2,395

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

hip

4,100

HIP: C++ Heterogeneous-Compute Interface for Portability

cuda-samples

7,668

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Quick Overview

The intel/compute-runtime repository is an open-source project that provides the Intel Graphics Compute Runtime for OpenCL and oneAPI Level Zero. It enables developers to leverage Intel's integrated and discrete GPUs for general-purpose computing tasks, supporting a wide range of Intel processors.

Pros

Supports both OpenCL and oneAPI Level Zero, providing flexibility for developers
Optimized for Intel GPUs, offering high performance for compatible hardware
Regularly updated with new features and improvements
Open-source nature allows for community contributions and customizations

Cons

Limited to Intel GPUs, not compatible with other manufacturers' hardware
May require specific driver versions, which can lead to compatibility issues
Learning curve for developers new to GPU computing or Intel's ecosystem
Performance may vary depending on the specific Intel GPU model

Getting Started

To get started with the Intel Graphics Compute Runtime:

Ensure you have a compatible Intel GPU and supported Linux distribution.
Install the necessary dependencies:

sudo apt-get update
sudo apt-get install ocl-icd-libopencl1 opencl-headers clinfo

Download and install the latest release from the GitHub repository:

git clone https://github.com/intel/compute-runtime.git
cd compute-runtime
mkdir build && cd build
cmake ..
make
sudo make install

Verify the installation:

clinfo

This should display information about the available OpenCL platforms and devices, including your Intel GPU.

Competitor Comparisons

DirectML

2,395

Pros of DirectML

Broader hardware support across multiple vendors (not limited to Intel)
Designed for machine learning workloads, potentially offering better performance for AI/ML tasks
Integration with DirectX ecosystem for graphics and compute

Cons of DirectML

Windows-centric, less cross-platform support
May have a steeper learning curve for developers not familiar with DirectX

Code Comparison

DirectML (simplified example):

dml::Expression input = dml::InputTensor(graph, 0, inputDesc);
dml::Expression weights = dml::InputTensor(graph, 1, weightsDesc);
dml::Expression output = dml::Convolution(input, weights);

Compute Runtime (OpenCL-based):

cl_kernel kernel = clCreateKernel(program, "convolution", NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &inputBuffer);
clSetKernelArg(kernel, 1, sizeof(cl_mem), &weightsBuffer);
clEnqueueNDRangeKernel(queue, kernel, work_dim, global_work_size, local_work_size, 0, NULL, NULL);

Both repositories aim to provide efficient compute capabilities, but they target different ecosystems and use cases. DirectML focuses on machine learning acceleration within the Microsoft ecosystem, while Compute Runtime provides a more general-purpose compute solution for Intel hardware using OpenCL.

hip

4,100

HIP: C++ Heterogeneous-Compute Interface for Portability

Pros of HIP

Open-source and vendor-neutral, supporting multiple GPU architectures
Easier porting of CUDA code to run on AMD GPUs
Active community development and frequent updates

Cons of HIP

Limited support for non-AMD GPUs compared to compute-runtime
Potentially lower performance on Intel hardware
Steeper learning curve for developers not familiar with CUDA

Code Comparison

HIP:

#include <hip/hip_runtime.h>

__global__ void vectorAdd(float *A, float *B, float *C, int N) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < N) C[i] = A[i] + B[i];
}

compute-runtime:

#include <CL/sycl.hpp>

void vectorAdd(sycl::queue &q, float *A, float *B, float *C, int N) {
    q.parallel_for(sycl::range<1>(N), [=](sycl::id<1> i) {
        C[i] = A[i] + B[i];
    });
}

The HIP code uses CUDA-like syntax, while compute-runtime uses SYCL for a more generic approach. HIP's syntax may be more familiar to CUDA developers, but compute-runtime's SYCL implementation offers better portability across different hardware vendors.

cuda-samples

7,668

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Pros of cuda-samples

Extensive collection of CUDA code examples covering various GPU computing topics
Well-documented samples with detailed explanations and performance tips
Regular updates to support the latest CUDA features and best practices

Cons of cuda-samples

Limited to NVIDIA GPUs and CUDA framework
Samples may require specific NVIDIA hardware or driver versions

Code Comparison

compute-runtime (OpenCL):

cl_int status;
cl_platform_id platform;
status = clGetPlatformIDs(1, &platform, NULL);
cl_device_id device;
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);

cuda-samples (CUDA):

int deviceCount;
cudaError_t error = cudaGetDeviceCount(&deviceCount);
if (error != cudaSuccess) {
    printf("Error: %s\n", cudaGetErrorString(error));
    exit(EXIT_FAILURE);
}

The compute-runtime repository focuses on Intel's OpenCL implementation for their GPUs, while cuda-samples provides CUDA examples for NVIDIA GPUs. compute-runtime is more of a runtime implementation, whereas cuda-samples is a collection of educational examples. The code snippets show the different APIs used for device initialization in OpenCL and CUDA.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Broader ecosystem and community support
More extensive documentation and learning resources
Supports a wider range of hardware platforms

Cons of TensorFlow

Larger codebase and potentially steeper learning curve
May have higher overhead for simple tasks
Less optimized for Intel-specific hardware

Code Comparison

TensorFlow example:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

Compute Runtime example:

#include <CL/cl.h>

cl_platform_id platform;
cl_device_id device;
clGetPlatformIDs(1, &platform, NULL);
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);

Summary

TensorFlow is a more versatile and widely-used machine learning framework with extensive community support. Compute Runtime, on the other hand, is specifically designed for Intel hardware and may offer better performance optimization for Intel GPUs. TensorFlow provides higher-level abstractions for machine learning tasks, while Compute Runtime offers lower-level control over compute operations on Intel devices.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Broader machine learning framework with extensive ecosystem
More active community and frequent updates
Supports dynamic computational graphs for flexible model development

Cons of PyTorch

Larger codebase and potentially steeper learning curve
May have higher resource requirements for basic operations
Less focused on specific hardware optimizations

Code Comparison

PyTorch example (tensor creation and basic operation):

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
print(z)

compute-runtime example (OpenCL kernel execution):

cl_int err;
cl_kernel kernel = clCreateKernel(program, "vector_add", &err);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &buffer_A);
clSetKernelArg(kernel, 1, sizeof(cl_mem), &buffer_B);
clSetKernelArg(kernel, 2, sizeof(cl_mem), &buffer_C);
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, &local_size, 0, NULL, NULL);

Summary

PyTorch is a comprehensive machine learning framework with a large community and ecosystem, while compute-runtime focuses on low-level GPU compute capabilities for Intel hardware. PyTorch offers more flexibility and ease of use for general machine learning tasks, but compute-runtime may provide better performance for specific Intel-based applications.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver

Introduction

The Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is an open source project providing compute API support (Level Zero, OpenCL) for Intel graphics hardware architectures (HD Graphics, Xe).

What is NEO?

NEO is the shorthand name for Compute Runtime contained within this repository. It is also a development mindset that we adopted when we first started the implementation effort for OpenCL.

The project evolved beyond a single API and NEO no longer implies a specific API. When talking about a specific API, we will mention it by name (e.g. Level Zero, OpenCL).

License

The Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is distributed under the MIT License.

You may obtain a copy of the License at: https://opensource.org/licenses/MIT

Supported Platforms

Platform	OpenCL	Level Zero
Intel Core Processors with Gen8 graphics devices (formerly Broadwell)	3.0	-
Intel Core Processors with Gen9 graphics devices (formerly Skylake, Kaby Lake, Coffee Lake)	3.0	Y
Intel Atom Processors with Gen9 graphics devices (formerly Apollo Lake, Gemini Lake)	3.0	-
Intel Core Processors with Gen11 graphics devices (formerly Ice Lake)	3.0	Y
Intel Atom Processors with Gen11 graphics devices (formerly Elkhart Lake)	3.0	-
Intel Core Processors with Gen12 graphics devices (formerly Tiger Lake, Rocket Lake, Alder Lake)	3.0	Y

Release cadence

Release cadence changed from weekly to monthly late 2022

At the beginning of each calendar month, we identify a well-tested driver version from the previous month as a release candidate for our monthly release.
We create a release branch and apply selected fixes for significant issues.
The branch naming convention is releases/yy.ww (yy - year, ww - work week of release candidate)
The builds are tagged using the following format: yy.ww.bbbbb.hh (yy - year, ww - work week, bbbbb - incremental build number from the master branch, hh - incremental commit number on release branch).
We publish and document a monthly release from the tip of that branch.
During subsequent weeks of a given month, we continue to cherry-pick fixes to that branch and may publish a hotfix release.
Quality level of the driver (per platform) will be provided in the Release Notes.

Installation Options

To allow NEO access to GPU device make sure user has permissions to files /dev/dri/renderD*.

Via system package manager

NEO is available for installation on a variety of Linux distributions and can be installed via the distro's package manager.

For example on Ubuntu* 22.04:

apt-get install intel-opencl-icd

Manual download

.deb packages for Ubuntu are provided along with installation instructions and Release Notes on the release page

Linking applications

Directly linking to the runtime library is not supported:

Level Zero applications should link with Level Zero loader
OpenCL applications should link with ICD loader library

Dependencies

GmmLib - https://github.com/intel/gmmlib
Intel Graphics Compiler - https://github.com/intel/intel-graphics-compiler

In addition, to enable performance counters support, the following packages are needed:

Intel(R) Metrics Discovery (MDAPI) - https://github.com/intel/metrics-discovery
Intel(R) Metrics Library for MDAPI - https://github.com/intel/metrics-library

How to provide feedback

By default, please submit an issue using native github.com interface.

How to contribute

Create a pull request on github.com with your patch. Make sure your change is cleanly building and passing ULTs. A maintainer will contact you if there are questions or concerns. See contribution guidelines for more details.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of DirectML

Cons of DirectML

Code Comparison

Pros of HIP

Cons of HIP

Code Comparison

Pros of cuda-samples

Cons of cuda-samples

Code Comparison

Pros of TensorFlow

Cons of TensorFlow

Code Comparison

Summary

Pros of PyTorch

Cons of PyTorch

Code Comparison

Summary

Convert designs to code with AI

README

Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver

Introduction

What is NEO?

License

Supported Platforms

Release cadence

Installation Options

Via system package manager

Manual download

Linking applications

Dependencies

How to provide feedback

How to contribute

See also

Level Zero specific

OpenCL specific

Top Related Projects

Convert designs to code with AI