Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance concurrent systems developed in C99+.
Modern concurrency primitives and building blocks for high performance applications.
GitHub Actions | Cirrus |
---|---|
Compilers tested in the past include gcc, clang, cygwin, icc, mingw32, mingw64 and suncc across all supported architectures. All new architectures are required to pass the integration test and under-go extensive code review.
Continuous integration is currently enabled for the following targets:
darwin/clang/arm64
freebsd/clang/x86-64
linux/gcc/arm64
linux/gcc/x86-64
linux/clang/x86-64
Step 1.
./configure
For additional options try ./configure --help
Step 2.
In order to compile regressions (requires POSIX threads) use
make regressions
. In order to compile libck use make all
or make
.
Step 3.
In order to install use make install
To uninstall use make uninstall
.
See http://concurrencykit.org/ for more information.
Concurrency Kit supports any architecture using compiler built-ins as a fallback. There is usually a performance degradation associated with this.
Concurrency Kit has specialized assembly for the following architectures:
aarch64
arm
ppc
ppc64
riscv64
s390x
sparcv9+
x86
x86_64
Concurrency primitives including architecture-specific ones. Provides wrappers around CAS in case of missing native support. This also provides support for RTM (transactional memory), pipeline control, read-for-ownership and more.
A simple and efficient (minimal noise) backoff function.
Abstracted compiler builtins when writing efficient concurrent data structures.
A scalable safe memory reclamation mechanism with support for idle threads and various optimizations that make it better than or competitive with many state-of-the-art solutions.
Implements support for hazard pointers, a simple and efficient lock-free safe memory reclamation mechanism.
A simple concurrently-readable pointer array structure.
An efficient multi-reader and multi-writer concurrent bitmap structure.
Efficient concurrent bounded FIFO data structures with various performance trade-off. This includes specialization for single-reader, many-reader, single-writer and many-writer.
A reference implementation of the first published lock-free FIFO algorithm, with specialization for single-enqueuer-single-dequeuer and many-enqueuer-single-dequeuer and extensions to allow for node re-use.
A reference implementation of the above algorithm, implemented with safe memory reclamation using hazard pointers.
A reference implementation of a Treiber stack with support for hazard pointers.
A reference implementation of an efficient lock-free stack, with specialized variants for a variety of memory management strategies and bounded concurrency.
A concurrently readable friendly derivative of the BSD-queue interface. Coupled with a safe memory reclamation mechanism, implement scalable read-side queues with a simple search and replace.
An extremely efficient single-writer-many-reader hash set, that satisfies lock-freedom with bounded concurrency without any usage of atomic operations and allows for recycling of unused or deleted slots. This data structure is recommended for use as a general hash-set if it is possible to compute values from keys.
A specialization of the ck_hs
algorithm allowing for disjunct key-value pairs.
A variant of ck_hs
that utilizes robin-hood hashing to allow for improved
performance with higher load factors and high deletion rates.
An extremely efficient event counter implementation, a better alternative to condition variables with specialization for fixed concurrency use-cases.
A plethora of execution barriers including: centralized barriers, combining barriers, dissemination barriers, MCS barriers, tournament barriers.
A simple big-reader lock implementation, write-biased reader-writer lock with scalable read-side locking.
An implementation of bytelocks, for research purposes, allowing for (in theory), fast read-side acquisition without the use of atomic operations. In reality, memory barriers are required on the fast path.
A generic lock cohorting interface, allows you to turn any lock into a NUMA-friendly scalable NUMA lock. There is a significant trade-off in fast path acquisition cost. Specialization is included for all relevant lock implementations in Concurrency Kit. Learn more by reading “Lock Cohorting: A General Technique for Designing NUMA Locks”.
A generic lock elision framework, allows you to turn any lock implementation into an elision-aware implementation. This requires support for restricted transactional memory by the underlying hardware.
Phase-fair reader-writer mutex that provides strong fairness guarantees between readers and writers. Learn more by reading “Spin-Based Reader-Writer Synchronization for Multiprocessor Real-Time Systems”.
A generic read-write lock cohorting interface, allows you to turn any read-write lock into a NUMA-friendly scalable NUMA lock. There is a significant trade-off in fast path acquisition cost. Specialization is included for all relevant lock implementations in Concurrency Kit. Learn more by reading “Lock Cohorting: A General Technique for Designing NUMA Locks”.
A simple centralized write-biased read-write lock.
A sequence counter lock, popularized by the Linux kernel, allows for very fast read and write synchronization for simple data structures where deep copy is permitted.
A single-writer specialized read-lock that is copy-safe, useful for data structures that must remain small, be copied and contain in-band mutexes.
Task-fair locks are fair read-write locks, derived from “Scalable reader-writer synchronization for shared-memory multiprocessors”.
A basic but very fast spinlock implementation.
Scalable and fast anderson spinlocks. This is here for reference, one of the earliest scalable and fair lock implementations.
A basic spinlock utilizing compare_and_swap.
A basic spinlock, a C adaption of the older optimized Linux kernel spinlock for x86. Primarily here for reference.
A basic spinlock utilizing atomic exchange.
An efficient implementation of the scalable CLH lock, providing many of the same performance properties of MCS with a better fast-path.
A NUMA-friendly CLH lock.
An implementation of the seminal scalable and fair MCS lock.
An implementation of fair centralized locks.