ck

Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance concurrent systems developed in C99+.

2,517

324

2,517

View on GitHub

Top Related Projects

junction

1,430

Concurrent data structures in C++

MPMCQueue

1,322

A bounded multi-producer multi-consumer concurrent queue written in C++11

concurrentqueue

11,197

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

folly

29,642

An open-source C++ library developed and used at Facebook.

oneTBB

6,368

oneAPI Threading Building Blocks (oneTBB)

Quick Overview

Concurrency Kit (ck) is a high-performance concurrency library for C and C++. It provides a collection of lock-free and concurrent data structures, as well as low-level synchronization primitives designed for scalability and efficiency in multi-threaded environments.

Pros

Highly optimized for performance and scalability
Extensive collection of lock-free data structures
Cross-platform support (Linux, BSD, Windows, macOS)
Well-documented with thorough API references

Cons

Steep learning curve for developers unfamiliar with lock-free programming
Limited higher-level abstractions compared to some other concurrency libraries
Requires careful usage to avoid subtle concurrency bugs
May be overkill for simpler concurrent applications

Code Examples

Using a lock-free stack:

#include <ck_stack.h>

ck_stack_t my_stack;
ck_stack_entry_t element;

// Initialize the stack
ck_stack_init(&my_stack);

// Push an element onto the stack
ck_stack_push_upmc(&my_stack, &element);

// Pop an element from the stack
ck_stack_entry_t *popped = ck_stack_pop_upmc(&my_stack);

Implementing a spinlock:

#include <ck_spinlock.h>

ck_spinlock_t my_lock = CK_SPINLOCK_INITIALIZER;

// Acquire the lock
ck_spinlock_lock(&my_lock);

// Critical section
// ...

// Release the lock
ck_spinlock_unlock(&my_lock);

Using an atomic operation:

#include <ck_pr.h>

uint64_t value = 0;

// Atomically increment the value
ck_pr_faa_64(&value, 1);

// Atomically compare and swap
uint64_t expected = 1;
uint64_t new_value = 2;
bool success = ck_pr_cas_64(&value, expected, new_value);

Getting Started

To use Concurrency Kit in your project:

Download and install the library from the official repository.
Include the necessary headers in your C/C++ file:

#include <ck_stack.h>
#include <ck_spinlock.h>
#include <ck_pr.h>

Compile your program with the Concurrency Kit library:

gcc -o my_program my_program.c -lck

Start using the Concurrency Kit data structures and primitives in your code as shown in the examples above.

Competitor Comparisons

junction

1,430

Concurrent data structures in C++

Pros of Junction

More focused on specific concurrent data structures like queues and hash tables
Provides detailed documentation and examples for each data structure
Designed with C++11 features in mind, offering modern C++ compatibility

Cons of Junction

Smaller community and less frequent updates compared to CK
Limited to C++ language, while CK supports C and can be used in C++ projects
Narrower scope of concurrent primitives compared to CK's broader offering

Code Comparison

Junction queue usage:

junction::ConcurrentQueue<int> queue;
queue.push(42);
int value;
if (queue.try_pop(value)) {
    // Use value
}

CK queue usage:

ck_queue_t queue;
ck_queue_init(&queue);
ck_queue_enqueue(&queue, &item);
void *value = ck_queue_dequeue(&queue);

Both libraries provide efficient concurrent data structures, but Junction focuses on C++ specific implementations while CK offers a broader range of primitives in C. CK has a larger community and more frequent updates, making it potentially more suitable for projects requiring long-term support and a wider range of concurrent operations. Junction, however, might be preferable for modern C++ projects seeking specialized concurrent data structures with detailed documentation.

MPMCQueue

1,322

A bounded multi-producer multi-consumer concurrent queue written in C++11

Pros of MPMCQueue

Focused implementation of a multi-producer, multi-consumer queue
Designed for low-latency and high-throughput scenarios
Simple API with clear documentation

Cons of MPMCQueue

Limited scope compared to CK's broader concurrency toolkit
May require additional synchronization primitives for complex use cases
Less mature and less widely adopted than CK

Code Comparison

MPMCQueue:

MPMCQueue<int> q(100);
q.try_push(42);
int value;
q.try_pop(value);

CK:

ck_ring_t ring;
ck_ring_buffer_t buffer[SIZE];
ck_ring_init(&ring, buffer, SIZE);
ck_ring_enqueue_spsc(&ring, buffer, &data);
ck_ring_dequeue_spsc(&ring, buffer, &result);

Summary

MPMCQueue offers a specialized, high-performance queue implementation, while CK provides a comprehensive concurrency toolkit. MPMCQueue is ideal for specific use cases requiring a multi-producer, multi-consumer queue, whereas CK offers a broader range of concurrency primitives and data structures. The choice between the two depends on the specific requirements of the project and the desired level of abstraction.

concurrentqueue

11,197

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

Pros of concurrentqueue

Focused specifically on lock-free queues, providing a simpler API for queue operations
Header-only library, making it easier to integrate into existing projects
Extensive benchmarking and performance optimizations for queue operations

Cons of concurrentqueue

Limited scope compared to CK, which offers a broader range of concurrency primitives
Less flexibility for customizing low-level concurrency behavior
Fewer options for atomic operations and memory ordering

Code Comparison

CK example (atomic increment):

ck_pr_inc_uint(&value);

concurrentqueue example (enqueue operation):

moodycamel::ConcurrentQueue<int> q;
q.enqueue(42);

Summary

CK provides a comprehensive toolkit for concurrency primitives, offering more flexibility and control over low-level operations. concurrentqueue, on the other hand, focuses specifically on efficient, lock-free queue implementations with a simpler API. CK is better suited for projects requiring a wide range of concurrency tools, while concurrentqueue excels in scenarios where high-performance queues are the primary concern.

folly

29,642

An open-source C++ library developed and used at Facebook.

Pros of Folly

Broader scope: Folly is a comprehensive C++ library with a wide range of utilities, while CK focuses primarily on concurrency primitives
Active development: Folly has more frequent updates and a larger contributor base
Integration with other Facebook projects: Seamlessly works with other Facebook open-source libraries

Cons of Folly

Larger footprint: Folly's extensive feature set results in a larger codebase and potential overhead
Steeper learning curve: The breadth of functionality may require more time to master compared to CK's focused approach

Code Comparison

CK (Atomic operations):

ck_pr_cas_ptr(&target, old_value, new_value);

Folly (Atomic operations):

folly::atomic_compare_exchange_strong(&target, &old_value, new_value);

Both libraries provide atomic operations, but Folly's implementation aligns more closely with C++11 standards.

Summary

Folly offers a more comprehensive set of utilities and active development, making it suitable for large-scale projects. CK, on the other hand, provides a focused, lightweight solution for concurrency primitives. The choice between the two depends on project requirements, team expertise, and performance considerations.

oneTBB

6,368

oneAPI Threading Building Blocks (oneTBB)

Pros of oneTBB

More comprehensive parallel programming library with a wider range of features
Better integration with C++ standard library and modern C++ practices
Extensive documentation and support from Intel

Cons of oneTBB

Larger footprint and potentially more complex to use for simple tasks
May have more overhead for basic operations compared to CK
Less suitable for low-level, performance-critical systems programming

Code Comparison

oneTBB example (parallel for loop):

#include <tbb/parallel_for.h>
#include <vector>

void process(std::vector<int>& data) {
    tbb::parallel_for(0, data.size(), [&](int i) {
        data[i] = data[i] * 2;
    });
}

CK example (lock-free stack):

#include <ck_stack.h>

ck_stack_t stack;
ck_stack_entry_t entry;

ck_stack_init(&stack);
ck_stack_push_upmc(&stack, &entry);
ck_stack_entry_t *popped = ck_stack_pop_upmc(&stack);

The code examples highlight the different focus areas of the libraries. oneTBB provides high-level abstractions for parallel algorithms, while CK offers low-level concurrent data structures and primitives. oneTBB's example shows its integration with C++ features, whereas CK's example demonstrates its C-based, lock-free approach.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Concurrency Kit

Modern concurrency primitives and building blocks for high performance applications.

Continuous Integration

Compilers tested in the past include gcc, clang, cygwin, icc, mingw32, mingw64 and suncc across all supported architectures. All new architectures are required to pass the integration test and under-go extensive code review.

Continuous integration is currently enabled for the following targets:

darwin/clang/arm64
freebsd/clang/x86-64
linux/gcc/arm64
linux/gcc/x86-64
linux/clang/x86-64

Compile and Build

Step 1. ./configure For additional options try ./configure --help
Step 2. In order to compile regressions (requires POSIX threads) use make regressions. In order to compile libck use make all or make.
Step 3. In order to install use make install To uninstall use make uninstall.

See http://concurrencykit.org/ for more information.

Supported Architectures

Concurrency Kit supports any architecture using compiler built-ins as a fallback. There is usually a performance degradation associated with this.

Concurrency Kit has specialized assembly for the following architectures:

aarch64
arm
ppc
ppc64
riscv64
s390x
sparcv9+
x86
x86_64

Features

Concurrency Primitives

ck_pr

Concurrency primitives including architecture-specific ones. Provides wrappers around CAS in case of missing native support. This also provides support for RTM (transactional memory), pipeline control, read-for-ownership and more.

ck_backoff

A simple and efficient (minimal noise) backoff function.

ck_cc

Abstracted compiler builtins when writing efficient concurrent data structures.

Safe Memory Reclamation

ck_epoch

A scalable safe memory reclamation mechanism with support for idle threads and various optimizations that make it better than or competitive with many state-of-the-art solutions.

ck_hp

Implements support for hazard pointers, a simple and efficient lock-free safe memory reclamation mechanism.

Data Structures

ck_array

A simple concurrently-readable pointer array structure.

ck_bitmap

An efficient multi-reader and multi-writer concurrent bitmap structure.

ck_ring

Efficient concurrent bounded FIFO data structures with various performance trade-off. This includes specialization for single-reader, many-reader, single-writer and many-writer.

ck_fifo

A reference implementation of the first published lock-free FIFO algorithm, with specialization for single-enqueuer-single-dequeuer and many-enqueuer-single-dequeuer and extensions to allow for node re-use.

ck_hp_fifo

A reference implementation of the above algorithm, implemented with safe memory reclamation using hazard pointers.

ck_hp_stack

A reference implementation of a Treiber stack with support for hazard pointers.

ck_stack

A reference implementation of an efficient lock-free stack, with specialized variants for a variety of memory management strategies and bounded concurrency.

ck_queue

A concurrently readable friendly derivative of the BSD-queue interface. Coupled with a safe memory reclamation mechanism, implement scalable read-side queues with a simple search and replace.

ck_hs

An extremely efficient single-writer-many-reader hash set, that satisfies lock-freedom with bounded concurrency without any usage of atomic operations and allows for recycling of unused or deleted slots. This data structure is recommended for use as a general hash-set if it is possible to compute values from keys.

ck_ht

A specialization of the ck_hs algorithm allowing for disjunct key-value pairs.

ck_rhs

A variant of ck_hs that utilizes robin-hood hashing to allow for improved performance with higher load factors and high deletion rates.

Synchronization Primitives

ck_ec

An extremely efficient event counter implementation, a better alternative to condition variables with specialization for fixed concurrency use-cases.

ck_barrier

A plethora of execution barriers including: centralized barriers, combining barriers, dissemination barriers, MCS barriers, tournament barriers.

ck_brlock

A simple big-reader lock implementation, write-biased reader-writer lock with scalable read-side locking.

ck_bytelock

An implementation of bytelocks, for research purposes, allowing for (in theory), fast read-side acquisition without the use of atomic operations. In reality, memory barriers are required on the fast path.

ck_cohort

A generic lock cohorting interface, allows you to turn any lock into a NUMA-friendly scalable NUMA lock. There is a significant trade-off in fast path acquisition cost. Specialization is included for all relevant lock implementations in Concurrency Kit. Learn more by reading "Lock Cohorting: A General Technique for Designing NUMA Locks".

ck_elide

A generic lock elision framework, allows you to turn any lock implementation into an elision-aware implementation. This requires support for restricted transactional memory by the underlying hardware.

ck_pflock

Phase-fair reader-writer mutex that provides strong fairness guarantees between readers and writers. Learn more by reading "Spin-Based Reader-Writer Synchronization for Multiprocessor Real-Time Systems".

ck_rwcohort

A generic read-write lock cohorting interface, allows you to turn any read-write lock into a NUMA-friendly scalable NUMA lock. There is a significant trade-off in fast path acquisition cost. Specialization is included for all relevant lock implementations in Concurrency Kit. Learn more by reading "Lock Cohorting: A General Technique for Designing NUMA Locks".

ck_rwlock

A simple centralized write-biased read-write lock.

ck_sequence

A sequence counter lock, popularized by the Linux kernel, allows for very fast read and write synchronization for simple data structures where deep copy is permitted.

ck_swlock

A single-writer specialized read-lock that is copy-safe, useful for data structures that must remain small, be copied and contain in-band mutexes.

ck_tflock

Task-fair locks are fair read-write locks, derived from "Scalable reader-writer synchronization for shared-memory multiprocessors".

ck_spinlock

A basic but very fast spinlock implementation.

ck_spinlock_anderson

Scalable and fast anderson spinlocks. This is here for reference, one of the earliest scalable and fair lock implementations.

ck_spinlock_cas

A basic spinlock utilizing compare_and_swap.

ck_spinlock_dec

A basic spinlock, a C adaption of the older optimized Linux kernel spinlock for x86. Primarily here for reference.

ck_spinlock_fas

A basic spinlock utilizing atomic exchange.

ck_spinlock_clh

An efficient implementation of the scalable CLH lock, providing many of the same performance properties of MCS with a better fast-path.

ck_spinlock_hclh

A NUMA-friendly CLH lock.

ck_spinlock_mcs

An implementation of the seminal scalable and fair MCS lock.

ck_spinlock_ticket

An implementation of fair centralized locks.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot