Top Related Projects
Quick Overview
Klib is a standalone and lightweight C library that provides efficient implementations of various algorithms and data structures. It focuses on performance and simplicity, offering a collection of useful tools for C programmers without the need for external dependencies.
Pros
- Lightweight and standalone, with no external dependencies
- High-performance implementations of common algorithms and data structures
- Easy to integrate into existing C projects
- Well-documented and actively maintained
Cons
- Limited scope compared to more comprehensive libraries
- Requires manual memory management in some cases
- May not be as feature-rich as larger, more specialized libraries
- Primarily focused on C, with limited support for other languages
Code Examples
- Using the khash hash table:
#include "khash.h"
KHASH_MAP_INIT_INT(m32, char)
int main() {
int ret, is_missing;
khint_t k;
khash_t(m32) *h = kh_init(m32);
k = kh_put(m32, h, 5, &ret);
kh_value(h, k) = 10;
k = kh_get(m32, h, 10);
is_missing = (k == kh_end(h));
k = kh_get(m32, h, 5);
kh_del(m32, h, k);
kh_destroy(m32, h);
return 0;
}
- Using the ksort sorting function:
#include "ksort.h"
KSORT_INIT_GENERIC(int)
int main() {
int arr[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3};
int n = sizeof(arr) / sizeof(arr[0]);
ks_introsort(int, n, arr);
return 0;
}
- Using the kstring string manipulation functions:
#include "kstring.h"
int main() {
kstring_t str = {0, 0, 0};
ksprintf(&str, "Hello, %s!", "world");
kputs("How are you?", &str);
printf("%s\n", str.s);
free(str.s);
return 0;
}
Getting Started
To use Klib in your project:
-
Clone the repository:
git clone https://github.com/attractivechaos/klib.git
-
Include the desired header files in your C source code:
#include "klib/khash.h" #include "klib/ksort.h" #include "klib/kstring.h"
-
Compile your program with the Klib headers:
gcc -I/path/to/klib your_program.c -o your_program
Note: Klib is header-only, so you don't need to link against any libraries.
Competitor Comparisons
Toolkit for processing sequences in FASTA/Q formats
Pros of seqtk
- Specialized for sequence processing tasks in bioinformatics
- Includes a wide range of sequence manipulation tools
- Optimized for handling large genomic datasets efficiently
Cons of seqtk
- More focused scope compared to the general-purpose klib
- May have a steeper learning curve for non-bioinformaticians
- Less suitable for general-purpose programming tasks
Code Comparison
seqtk (sequence processing):
while ((l = kseq_read(seq)) >= 0) {
if (seq->qual.l) {
for (i = 0; i < seq->qual.l; ++i)
seq->qual.s[i] = q_int2char(33 + (seq->qual.s[i] - 33 + q) / 2);
}
puts(seq->name.s);
puts(seq->seq.s);
if (seq->qual.l) puts(seq->qual.s);
}
klib (general-purpose library):
#include "khash.h"
KHASH_MAP_INIT_STR(str, int)
khash_t(str) *h;
h = kh_init(str);
k = kh_put(str, h, "key", &ret);
kh_value(h, k) = 1;
kh_del(str, h, k);
kh_destroy(str, h);
The code snippets demonstrate the specialized nature of seqtk for sequence processing, while klib provides more general-purpose data structures and algorithms.
A versatile pairwise aligner for genomic and spliced nucleotide sequences
Pros of minimap2
- Specialized for long-read sequence alignment and mapping
- Optimized for performance with large genomic datasets
- Actively maintained and widely used in bioinformatics
Cons of minimap2
- More complex and domain-specific than klib
- Larger codebase and potentially steeper learning curve
- Less suitable for general-purpose programming tasks
Code Comparison
minimap2:
int mm_idx_gen(mm_idx_t *mi, int n_threads)
{
int i, j;
mm_idx_bucket_t *b;
if (mi->h == 0) return 0;
for (i = 0; i < 1<<mi->b; ++i) {
b = &mi->B[i];
klib:
#define KSORT_INIT(name, type_t, __sort_lt) \
KSORT_INIT_GENERIC(name, type_t, __sort_lt, __ks_insertsort_##name, __ks_combsort_##name, __ks_mergesort_##name, __ks_introsort_##name)
#define ks_lt_generic(a, b) ((a) < (b))
minimap2 is tailored for genomic sequence alignment, featuring specialized data structures and algorithms. klib, on the other hand, provides generic macros and functions for broader use cases. The code snippets illustrate this difference, with minimap2 showing a specific indexing function, while klib demonstrates a generic sorting macro.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Klib: a Generic Library in C
Overview
Klib is a standalone and lightweight C library distributed under MIT/X11 license. Most components are independent of external libraries, except the standard C library, and independent of each other. To use a component of this library, you only need to copy a couple of files to your source code tree without worrying about library dependencies.
Klib strives for efficiency and a small memory footprint. Some components, such as khash.h, kbtree.h, ksort.h and kvec.h, are among the most efficient implementations of similar algorithms or data structures in all programming languages, in terms of both speed and memory use.
A new documentation is available here which includes most information in this README file.
Common components
- khash.h: generic hash table with open addressing.
- kbtree.h: generic search tree based on B-tree.
- kavl.h: generic intrusive AVL tree.
- ksort.h: generic sort, including introsort, merge sort, heap sort, comb sort, Knuth shuffle and the k-small algorithm.
- kseq.h: generic stream buffer and a FASTA/FASTQ format parser.
- kvec.h: generic dynamic array.
- klist.h: generic single-linked list and memory pool.
- kstring.{h,c}: basic string library.
- kmath.{h,c}: numerical routines including MT19937-64 pseudorandom generator, basic nonlinear programming and a few special math functions.
- ketopt.h: portable command-line argument parser with getopt_long-like API.
Components for more specific use cases
- ksa.c: constructing suffix arrays for strings with multiple sentinels, based on a revised SAIS algorithm.
- knetfile.{h,c}: random access to remote files on HTTP or FTP.
- kopen.c: smart stream opening.
- khmm.{h,c}: basic HMM library.
- ksw.(h,c}: Striped Smith-Waterman algorithm.
- knhx.{h,c}: Newick tree format parser.
Methodology
For the implementation of generic containers, klib extensively uses C
macros. To use these data structures, we usually need to instantiate methods by
expanding a long macro. This makes the source code look unusual or even ugly
and adds difficulty to debugging. Unfortunately, for efficient generic
programming in C that lacks template, using macros is the only
solution. Only with macros, we can write a generic container which, once
instantiated, compete with a type-specific container in efficiency. Some
generic libraries in C, such as Glib, use the void*
type to implement
containers. These implementations are usually slower and use more memory than
klib (see this benchmark).
To effectively use klib, it is important to understand how it achieves generic programming. We will use the hash table library as an example:
#include "khash.h"
KHASH_MAP_INIT_INT(m32, char) // instantiate structs and methods
int main() {
int ret, is_missing;
khint_t k;
khash_t(m32) *h = kh_init(m32); // allocate a hash table
k = kh_put(m32, h, 5, &ret); // insert a key to the hash table
if (!ret) kh_del(m32, h, k);
kh_value(h, k) = 10; // set the value
k = kh_get(m32, h, 10); // query the hash table
is_missing = (k == kh_end(h)); // test if the key is present
k = kh_get(m32, h, 5);
kh_del(m32, h, k); // remove a key-value pair
for (k = kh_begin(h); k != kh_end(h); ++k) // traverse
if (kh_exist(h, k)) // test if a bucket contains data
kh_value(h, k) = 1;
kh_destroy(m32, h); // deallocate the hash table
return 0;
}
In this example, the second line instantiates a hash table with unsigned
as
the key type and char
as the value type. m32
names such a type of hash table.
All types and functions associated with this name are macros, which will be
explained later. Macro kh_init()
initiates a hash table and kh_destroy()
frees it. kh_put()
inserts a key and returns the iterator (or the position)
in the hash table. kh_get()
and kh_del()
get a key and delete an element,
respectively. Macro kh_exist()
tests if an iterator (or a position) is filled
with data.
An immediate question is this piece of code does not look like a valid C
program (e.g. lacking semicolon, assignment to an apparent function call and
apparent undefined m32
'variable'). To understand why the code is correct,
let's go a bit further into the source code of khash.h
, whose skeleton looks
like:
#define KHASH_INIT(name, SCOPE, key_t, val_t, is_map, _hashf, _hasheq) \
typedef struct { \
int n_buckets, size, n_occupied, upper_bound; \
unsigned *flags; \
key_t *keys; \
val_t *vals; \
} kh_##name##_t; \
SCOPE inline kh_##name##_t *init_##name() { \
return (kh_##name##_t*)calloc(1, sizeof(kh_##name##_t)); \
} \
SCOPE inline int get_##name(kh_##name##_t *h, key_t k) \
... \
SCOPE inline void destroy_##name(kh_##name##_t *h) { \
if (h) { \
free(h->keys); free(h->flags); free(h->vals); free(h); \
} \
}
#define _int_hf(key) (unsigned)(key)
#define _int_heq(a, b) (a == b)
#define khash_t(name) kh_##name##_t
#define kh_value(h, k) ((h)->vals[k])
#define kh_begin(h, k) 0
#define kh_end(h) ((h)->n_buckets)
#define kh_init(name) init_##name()
#define kh_get(name, h, k) get_##name(h, k)
#define kh_destroy(name, h) destroy_##name(h)
...
#define KHASH_MAP_INIT_INT(name, val_t) \
KHASH_INIT(name, static, unsigned, val_t, is_map, _int_hf, _int_heq)
KHASH_INIT()
is a huge macro defining all the structs and methods. When this
macro is called, all the code inside it will be inserted by the C
preprocess to the place where it is called. If the macro is called
multiple times, multiple copies of the code will be inserted. To avoid naming
conflict of hash tables with different key-value types, the library uses token
concatenation, which is a preprocessor feature whereby we can substitute
part of a symbol based on the parameter of the macro. In the end, the C
preprocessor will generate the following code and feed it to the compiler
(macro kh_exist(h,k)
is a little complex and not expanded for simplicity):
typedef struct {
int n_buckets, size, n_occupied, upper_bound;
unsigned *flags;
unsigned *keys;
char *vals;
} kh_m32_t;
static inline kh_m32_t *init_m32() {
return (kh_m32_t*)calloc(1, sizeof(kh_m32_t));
}
static inline int get_m32(kh_m32_t *h, unsigned k)
...
static inline void destroy_m32(kh_m32_t *h) {
if (h) {
free(h->keys); free(h->flags); free(h->vals); free(h);
}
}
int main() {
int ret, is_missing;
khint_t k;
kh_m32_t *h = init_m32();
k = put_m32(h, 5, &ret);
if (!ret) del_m32(h, k);
h->vals[k] = 10;
k = get_m32(h, 10);
is_missing = (k == h->n_buckets);
k = get_m32(h, 5);
del_m32(h, k);
for (k = 0; k != h->n_buckets; ++k)
if (kh_exist(h, k)) h->vals[k] = 1;
destroy_m32(h);
return 0;
}
This is the C program we know.
From this example, we can see that macros and the C preprocessor plays a key
role in klib. Klib is fast partly because the compiler knows the key-value
type at the compile time and is able to optimize the code to the same level
as type-specific code. A generic library written with void*
will not get such
performance boost.
Massively inserting code upon instantiation may remind us of C++'s slow compiling speed and huge binary size when STL/boost is in use. Klib is much better in this respect due to its small code size and component independency. Inserting several hundreds lines of code won't make compiling obviously slower.
Resources
- Library documentation, if present, is available in the header files. Examples can be found in the test/ directory.
- Obsolete documentation of the hash table library can be found at SourceForge. This README is partly adapted from the old documentation.
- Blog post describing the hash table library.
- Blog post on why using
void*
for generic programming may be inefficient. - Blog post on the generic stream buffer.
- Blog post evaluating the performance of
kvec.h
. - Blog post arguing B-tree may be a better data structure than a binary search tree.
- Blog post evaluating the performance of
khash.h
andkbtree.h
among many other implementations. An older version of the benchmark is also available. - Blog post benchmarking internal sorting algorithms and implementations.
- Blog post on the k-small algorithm.
- Blog post on the Hooke-Jeeve's algorithm for nonlinear programming.
Top Related Projects
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot