oneAPI-samples

Samples for Intel® oneAPI Toolkits

1,046

728

1,046

View on GitHub

Top Related Projects

llvm

1,347

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

compute-runtime

1,247

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

hip

4,100

HIP: C++ Heterogeneous-Compute Interface for Portability

cuda-samples

7,668

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Quick Overview

oneAPI-samples is a GitHub repository containing code samples and tutorials for Intel's oneAPI toolkit. It demonstrates how to use various oneAPI components, including DPC++, oneMKL, oneDNN, and others, to develop high-performance, cross-architecture applications. The repository serves as a valuable resource for developers looking to leverage oneAPI's capabilities in their projects.

Pros

Comprehensive collection of samples covering various oneAPI components
Well-documented examples with detailed explanations and comments
Supports multiple programming languages (C++, Python, Fortran)
Regularly updated to reflect the latest oneAPI features and best practices

Cons

Requires Intel hardware or emulation for full functionality
Some samples may be complex for beginners
Limited coverage of certain advanced topics
May require additional setup and dependencies for specific examples

Code Examples

Simple vector addition using DPC++:

#include <CL/sycl.hpp>
#include <array>
#include <iostream>

constexpr int array_size = 10000;

int main() {
    std::array<int, array_size> a, b, c;
    for (int i = 0; i < array_size; i++) {
        a[i] = i;
        b[i] = array_size - i;
    }

    cl::sycl::queue q;
    cl::sycl::buffer<int, 1> a_buf(a.data(), cl::sycl::range<1>(array_size));
    cl::sycl::buffer<int, 1> b_buf(b.data(), cl::sycl::range<1>(array_size));
    cl::sycl::buffer<int, 1> c_buf(c.data(), cl::sycl::range<1>(array_size));

    q.submit([&](cl::sycl::handler& h) {
        auto a_acc = a_buf.get_access<cl::sycl::access::mode::read>(h);
        auto b_acc = b_buf.get_access<cl::sycl::access::mode::read>(h);
        auto c_acc = c_buf.get_access<cl::sycl::access::mode::write>(h);

        h.parallel_for<class vector_add>(cl::sycl::range<1>(array_size),
            [=](cl::sycl::id<1> i) {
                c_acc[i] = a_acc[i] + b_acc[i];
            });
    });

    auto c_acc = c_buf.get_access<cl::sycl::access::mode::read>();
    for (int i = 0; i < array_size; i++) {
        if (c_acc[i] != array_size) {
            std::cout << "Error: Incorrect result" << std::endl;
            return 1;
        }
    }

    std::cout << "Success!" << std::endl;
    return 0;
}

Using oneMKL for matrix multiplication:

#include <CL/sycl.hpp>
#include <oneapi/mkl.hpp>
#include <iostream>

int main() {
    sycl::queue q;
    const int m = 2000, n = 1000, k = 1000;
    std::vector<float> A(m * k, 1.0f), B(k * n, 2.0f), C(m * n, 0.0f);

    oneapi::mkl::blas::gemm(q, oneapi::mkl::transpose::nontrans, oneapi::mkl::transpose::nontrans,
                            m, n, k, 1.0f, A.data(), m, B.data(), k, 0.0f, C.data(), m);

    q.wait();

    std::cout << "Matrix multiplication completed." << std::endl;
    return 0;
}

Using oneDNN for convolution:

#include <dnnl.hpp>
#include <iostream>
#include <vector>

Competitor Comparisons

llvm

1,347

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

Pros of LLVM

More comprehensive and mature codebase for compiler infrastructure
Broader community support and contributions
Wider range of supported languages and architectures

Cons of LLVM

Steeper learning curve for newcomers
Larger codebase, potentially more complex to navigate
May require more setup and configuration for specific use cases

Code Comparison

LLVM (C++):

#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/IRBuilder.h"

LLVMContext Context;
Module *M = new Module("MyModule", Context);
IRBuilder<> Builder(Context);

oneAPI-samples (DPC++):

#include <CL/sycl.hpp>
#include <array>
#include <iostream>

using namespace sycl;
queue q;
std::array<int, 10> data;

Summary

LLVM is a more extensive and versatile compiler infrastructure project, while oneAPI-samples focuses on providing examples and tutorials for Intel's oneAPI toolkit. LLVM offers broader language and architecture support but may be more challenging for beginners. oneAPI-samples is more specialized for Intel hardware and provides easier-to-follow examples for those working with oneAPI, but it has a narrower scope compared to LLVM.

compute-runtime

1,247

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

Pros of compute-runtime

Focuses on low-level GPU runtime implementation
Provides direct access to Intel GPU hardware capabilities
Suitable for advanced developers and system integrators

Cons of compute-runtime

Steeper learning curve for beginners
Less comprehensive documentation and examples
Narrower scope, primarily targeting Intel GPUs

Code Comparison

compute-runtime (OpenCL kernel):

__kernel void vector_add(__global const int *A, __global const int *B, __global int *C) {
    int i = get_global_id(0);
    C[i] = A[i] + B[i];
}

oneAPI-samples (DPC++ kernel):

h.parallel_for(range<1>(N), [=](id<1> i) {
    C[i] = A[i] + B[i];
});

Summary

compute-runtime is a lower-level implementation focusing on Intel GPU runtimes, while oneAPI-samples provides a broader set of examples and tutorials for the oneAPI ecosystem. compute-runtime offers more direct hardware access but requires more expertise, whereas oneAPI-samples is more beginner-friendly and covers a wider range of Intel hardware. The code comparison shows the difference between OpenCL and DPC++ implementations, with oneAPI-samples using a higher-level abstraction.

hip

4,100

HIP: C++ Heterogeneous-Compute Interface for Portability

Pros of HIP

Open-source and vendor-neutral, supporting both AMD and NVIDIA GPUs
Easier migration path from CUDA to HIP for existing CUDA applications
Lightweight runtime library with minimal overhead

Cons of HIP

Smaller ecosystem and community compared to oneAPI
Limited support for non-GPU accelerators (e.g., FPGAs, specialized AI hardware)
Less comprehensive toolset for heterogeneous computing

Code Comparison

HIP code example:

#include <hip/hip_runtime.h>

__global__ void vectorAdd(float *a, float *b, float *c, int n) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < n) c[i] = a[i] + b[i];
}

oneAPI code example:

#include <CL/sycl.hpp>

void vectorAdd(sycl::queue &q, float *a, float *b, float *c, int n) {
    q.parallel_for(sycl::range<1>(n), [=](sycl::id<1> i) {
        c[i] = a[i] + b[i];
    });
}

The HIP example uses CUDA-like syntax with explicit kernel definition, while the oneAPI example uses SYCL with a more abstracted parallel_for construct. HIP provides a familiar environment for CUDA developers, whereas oneAPI offers a higher-level abstraction for heterogeneous computing across various accelerators.

cuda-samples

7,668

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Pros of cuda-samples

More extensive collection of samples covering a wider range of CUDA features and applications
Longer history and more mature codebase, reflecting CUDA's established position in GPU computing
Includes advanced topics like multi-GPU programming and CUDA-specific optimizations

Cons of cuda-samples

Limited to NVIDIA GPUs, lacking cross-platform compatibility
Steeper learning curve for beginners due to CUDA's lower-level nature
Less focus on modern C++ features and programming paradigms

Code Comparison

cuda-samples (CUDA):

__global__ void vectorAdd(const float *A, const float *B, float *C, int numElements) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < numElements) {
        C[i] = A[i] + B[i];
    }
}

oneAPI-samples (DPC++):

q.parallel_for(sycl::range<1>(numElements), [=](sycl::id<1> i) {
    C[i] = A[i] + B[i];
});

The CUDA sample uses explicit kernel definition and thread indexing, while the oneAPI sample uses a higher-level abstraction with parallel_for and lambda functions, showcasing the difference in programming models between CUDA and oneAPI.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Widely adopted and supported by a large community
Comprehensive ecosystem with tools like TensorBoard and TensorFlow Serving
Supports multiple programming languages (Python, JavaScript, C++)

Cons of TensorFlow

Steeper learning curve for beginners
Can be slower for prototyping compared to other frameworks
Large library size and potential overhead for simpler projects

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

oneAPI-samples:

#include <CL/sycl.hpp>
#include <array>
#include <iostream>

constexpr size_t array_size = 10000;

Summary

TensorFlow is a comprehensive machine learning framework with a large ecosystem and community support. It offers more features and flexibility but may have a steeper learning curve. oneAPI-samples, on the other hand, focuses on demonstrating Intel's oneAPI toolkit capabilities across various hardware accelerators. It provides examples for heterogeneous computing but is more specialized compared to TensorFlow's general-purpose machine learning capabilities.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Larger community and ecosystem, with more resources and third-party libraries
More flexible and dynamic computational graph, allowing for easier debugging
Broader application range, including computer vision, NLP, and reinforcement learning

Cons of PyTorch

Steeper learning curve for beginners compared to oneAPI-samples
Less optimized for Intel hardware, potentially slower on Intel CPUs and GPUs
More complex setup and installation process

Code Comparison

PyTorch example:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)

oneAPI-samples example:

#include <CL/sycl.hpp>
using namespace sycl;

queue q;
int* data = malloc_shared<int>(N, q);
q.parallel_for(range<1>(N), [=](id<1> i) {
    data[i] = i;
}).wait();

The PyTorch example demonstrates simple tensor operations, while the oneAPI-samples code shows parallel processing using SYCL. PyTorch offers a more intuitive API for machine learning tasks, whereas oneAPI-samples provides lower-level control for heterogeneous computing across various Intel architectures.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

oneAPI Samples

The oneAPI-samples repository contains samples for the IntelÂ® oneAPI Toolkits.

The contents of the default branch in this repository are meant to be used with the most recent released version of the IntelÂ® oneAPI Toolkits.

Find oneAPI Samples

You can find samples by browsing the oneAPI Samples Catalog. Using the catalog you can search on the sample titles or descriptions.

You can refine your browsing or searching through filtering on the following:

Expertise (Getting Started, Tutorial, etc.)
Programming language (C++, Python, or Fortran)
Target device (CPU, GPU, and FPGA)

Get the oneAPI Samples

Clone the repository by entering the following command:

git clone https://github.com/oneapi-src/oneAPI-samples.git

Alternatively, you can download a zip file containing the primary branch in repository.

Click the Code button.
Select Download ZIP from the menu options.
After downloading the file, unzip the repository contents.

Get Earlier Versions of the oneAPI Samples

If you need samples for an earlier version of any of the IntelÂ® oneAPI Toolkits, then use a tagged version of the repository that corresponds with the toolkit version.

Clone an earlier version of the repository using Git by entering a command similar to the following:

git clone -b <tag> https://github.com/oneapi-src/oneAPI-samples.git

where <tag> is the GitHub tag corresponding to the toolkit version number, like 2025.2.0.

Alternatively, you can download a zip file containing a specific tagged version of the repository.

Select the appropriate tag.
Click the Code button.
Select Download ZIP from the menu options.
After downloading the file, unzip the repository contents.

Getting Started with oneAPI Samples

The best oneAPI sample to start with depends on what you are trying to learn or types of problems you are trying to solve.

If you want to learn about...	Start with...
the basics of writing, compiling, and building programs for CPUs and GPUs	Simple Add or Vector Add samples (You can use these samples as starter projects by removing unwanted elements and adding your code and build requirements.)
the basics of using artificial intelligence	Getting Started Samples for AI Tools

Note: The README.md included with each sample provides build instructions for all supported operating systems. For samples that run in Jupyter Notebooks, you may need to install or configure additional frameworks or package managers if you do not already have them on your system.

Using Integrated Development Environments (IDE)

If you prefer to use an Integrated Development Environment (IDE) with these samples, you can download Visual Studio Code for use on Windows or Linux.

Repository Structure

The oneAPI-sample repository is organized by high-level categories.

Platform Validation

Ubuntu 22.04.3

Intel(R) Xeon(R) Platinum 8468V
Intel(R) Data Center GPU Max 1100
OpenCL Driver: Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8468V OpenCL 3.0 (Build 0) [2025.20.5.0.15_220340]
Level Zero Driver: Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1100 12.60.7 [1.3.27642]
oneAPI package version:
‐ Intel oneAPI HPC Toolkit Build Version: 2025.2.0.463

Known Issues and Limitations

Windows

If you are using Microsoft Visual Studio* 2019, you must use Microsoft Visual Studio 2019 version 16.4.0 or newer.
If you encounter Error MSB6003 The specified task executable ... could not be run... when building a sample program, it might be due to the length of the directory path. Move the build directory to a location with a shorter path. Build the sample in the new location.

Additional Resources for Code Samples

A curated list of samples from oneAPI based projects, libraries, and tools. In addition, the most exciting samples from other AI projects that are not necessarily based on oneAPI are also listed here to provide you with the latest and valuable resources for augmenting your productivity.

OpenVINOâ¢ notebooks: A collection of ready-to-run Jupyter notebooks for learning and experimenting with the OpenVINOâ¢ Toolkit, an open-source AI toolkit that makes it easier to write once, deploy anywhere. The notebooks introduce OpenVINO basics and teach developers how to leverage the API for optimized deep learning inference.
IntelÂ® GaudiÂ® Tutorials: Tutorials with step-by-step instructions for running PyTorch and PyTorch Lightning models on the Intel Gaudi AI Processor for training and inferencing, from beginner level to advanced users.
Powered-by-Intel Leaderboard: This leaderboard celebrates and increases the discoverability of models developed on Intel hardware by the AI developer community. We provide developers with sample code and resources (developer programs) to deploy (inference) AI PC, IntelÂ® XeonÂ® Scalable processors, IntelÂ® GaudiÂ® processors, IntelÂ® Arcâ¢ GPUs, and IntelÂ® Data Center GPUs.
IntelÂ® AI Reference Models: This repository contains links to pre-trained models, sample scripts, best practices, and step-by-step tutorials for many popular open-source machine learning models optimized by Intel to run on IntelÂ® XeonÂ® Scalable processors and IntelÂ® Data Center GPUs.
awesome-oneapi: A community sourced list of awesome oneAPI and SYCL projects for solutions across a wide range of industry segments.
Generative AI Examples: A collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project. OPEA is an ecosystem orchestration framework to integrate performant GenAI technologies & workflows leading to quicker GenAI adoption and business value.

Licenses

Code samples are licensed under the MIT license. See License.txt for details.

Third-party program licenses can be found here: third-party-programs.txt.

Notices and Disclaimers

Â© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot