libdeflate
Heavily optimized library for DEFLATE/zlib/gzip compression and decompression
Top Related Projects
A massively spiffy yet delicately unobtrusive compression library.
Zopfli Compression Algorithm is a compression library programmed in C to perform very good, but slow, deflate or zlib compression.
Zstandard - Fast real-time compression algorithm
Extremely Fast Compression algorithm
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz
Quick Overview
Libdeflate is a high-performance library for DEFLATE/zlib/gzip compression and decompression. It provides a fast and efficient implementation of these algorithms, focusing on both speed and compression ratio. The library is designed to be easily integrated into existing projects and offers a simple API for developers.
Pros
- Significantly faster than zlib in most cases, especially for compression
- Provides better compression ratios compared to zlib at comparable speed levels
- Offers a simple and easy-to-use API for integration into existing projects
- Supports various compression formats including DEFLATE, zlib, and gzip
Cons
- Limited to DEFLATE-based compression algorithms, not suitable for other compression methods
- May require additional effort to integrate into projects heavily reliant on zlib
- Not as widely adopted or supported as zlib in some ecosystems
- Lacks some advanced features found in more comprehensive compression libraries
Code Examples
- Compressing data:
#include <libdeflate.h>
// Compress data
size_t compressed_size;
char *compressed_data = malloc(libdeflate_zlib_compress_bound(decompressed_size));
struct libdeflate_compressor *compressor = libdeflate_alloc_compressor(6);
compressed_size = libdeflate_zlib_compress(compressor, decompressed_data, decompressed_size, compressed_data, libdeflate_zlib_compress_bound(decompressed_size));
libdeflate_free_compressor(compressor);
- Decompressing data:
#include <libdeflate.h>
// Decompress data
size_t actual_decompressed_size;
char *decompressed_data = malloc(expected_decompressed_size);
struct libdeflate_decompressor *decompressor = libdeflate_alloc_decompressor();
enum libdeflate_result result = libdeflate_zlib_decompress(decompressor, compressed_data, compressed_size, decompressed_data, expected_decompressed_size, &actual_decompressed_size);
libdeflate_free_decompressor(decompressor);
- Checking CRC32:
#include <libdeflate.h>
// Calculate CRC32
uint32_t crc = libdeflate_crc32(0, data, data_size);
Getting Started
To use libdeflate in your project, follow these steps:
-
Clone the repository:
git clone https://github.com/ebiggers/libdeflate.git
-
Build the library:
cd libdeflate make
-
Include the library in your project:
#include <libdeflate.h>
-
Link against the built library when compiling your project:
gcc -o your_program your_program.c -L/path/to/libdeflate -ldeflate
Remember to adjust the include and library paths according to your project structure.
Competitor Comparisons
A massively spiffy yet delicately unobtrusive compression library.
Pros of zlib
- Widely adopted and supported across many platforms and programming languages
- Extensive documentation and community resources
- Proven stability and reliability in production environments
Cons of zlib
- Generally slower compression and decompression speeds compared to libdeflate
- Less optimized for modern CPU architectures
- Limited focus on specific use cases, as it aims to be a general-purpose library
Code Comparison
zlib:
z_stream strm;
deflateInit(&strm, Z_DEFAULT_COMPRESSION);
deflate(&strm, Z_FINISH);
deflateEnd(&strm);
libdeflate:
struct libdeflate_compressor *compressor;
compressor = libdeflate_alloc_compressor(6);
libdeflate_deflate_compress(compressor, in, in_size, out, out_nbytes);
libdeflate_free_compressor(compressor);
Key Differences
- libdeflate focuses on optimizing compression and decompression speeds, particularly for x86 architectures
- zlib offers a more comprehensive set of features and wider compatibility
- libdeflate provides a simpler API with fewer functions, while zlib has a more extensive API
- zlib supports streaming compression and decompression, whereas libdeflate primarily targets non-streaming use cases
- libdeflate is designed for modern systems and may not be suitable for older or resource-constrained environments
Zopfli Compression Algorithm is a compression library programmed in C to perform very good, but slow, deflate or zlib compression.
Pros of Zopfli
- Achieves higher compression ratios, especially for static content
- Offers multiple output formats (DEFLATE, zlib, gzip)
- Suitable for scenarios where compression time is not critical
Cons of Zopfli
- Significantly slower compression speed compared to libdeflate
- Limited to compression only, no decompression functionality
- Not optimized for real-time or on-the-fly compression scenarios
Code Comparison
Zopfli:
ZopfliOptions options;
ZopfliInitOptions(&options);
unsigned char* out = 0;
size_t outsize = 0;
ZopfliCompress(&options, ZopfliFormat, in, insize, &out, &outsize);
libdeflate:
struct libdeflate_compressor *compressor = libdeflate_alloc_compressor(6);
size_t compressed_size = libdeflate_deflate_compress(compressor,
in, insize,
out, outsize);
libdeflate_free_compressor(compressor);
Summary
Zopfli focuses on achieving maximum compression ratios at the cost of speed, making it ideal for compressing static content where compression time is not a concern. libdeflate, on the other hand, prioritizes speed and efficiency, offering both compression and decompression capabilities suitable for real-time applications. The code comparison highlights the simplicity of using libdeflate compared to the more complex setup required for Zopfli.
Zstandard - Fast real-time compression algorithm
Pros of zstd
- Higher compression ratios, especially for larger files
- Faster decompression speeds
- More versatile with multiple compression levels and dictionary support
Cons of zstd
- Larger library size and more complex implementation
- Slightly slower compression speeds for small files
- Less focus on embedded systems and resource-constrained environments
Code Comparison
zstd:
size_t ZSTD_compress(void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);
libdeflate:
size_t libdeflate_deflate_compress(struct libdeflate_compressor *compressor,
const void *in, size_t in_nbytes,
void *out, size_t out_nbytes_avail);
Both libraries offer simple compression functions, but zstd provides more options for customization through its compression level parameter. libdeflate focuses on a streamlined API for deflate compression.
zstd is generally better suited for applications requiring high compression ratios and fast decompression, while libdeflate excels in scenarios where a lightweight, efficient implementation of the deflate algorithm is needed, particularly in embedded systems or when working with smaller files.
Extremely Fast Compression algorithm
Pros of lz4
- Extremely fast compression and decompression speeds
- Wide platform support and language bindings
- Active development and large community
Cons of lz4
- Lower compression ratio compared to other algorithms
- Limited configuration options for fine-tuning
Code comparison
lz4:
char* compressed = (char*)malloc(LZ4_compressBound(inputSize));
int compressedSize = LZ4_compress_default(input, compressed, inputSize, LZ4_compressBound(inputSize));
libdeflate:
size_t compressed_size = libdeflate_deflate_compress(compressor, input, input_size, compressed, max_compressed_size);
Key differences
- libdeflate focuses on DEFLATE, zlib, and gzip compression, while lz4 uses its own algorithm
- libdeflate offers better compression ratios, while lz4 prioritizes speed
- lz4 has a simpler API and is easier to integrate into existing projects
- libdeflate provides more fine-grained control over compression levels and memory usage
Use cases
- lz4: Real-time compression, fast data transfer, and scenarios where speed is crucial
- libdeflate: Applications requiring better compression ratios and compatibility with existing DEFLATE-based formats
Both libraries have their strengths, and the choice between them depends on specific project requirements, prioritizing either speed or compression efficiency.
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz
Pros of miniz
- Simpler API and easier to integrate into existing projects
- Supports both compression and decompression in a single library
- Includes additional utilities like ZIP file handling
Cons of miniz
- Generally slower compression and decompression speeds
- Less optimized for modern CPU architectures
- May have higher memory usage in some scenarios
Code Comparison
miniz:
mz_ulong src_len = (mz_ulong)strlen(src_buf);
mz_ulong compressed_size = mz_compressBound(src_len);
unsigned char *pCompressed_data = (unsigned char *)malloc(compressed_size);
if (mz_compress(pCompressed_data, &compressed_size, (const unsigned char *)src_buf, src_len) != MZ_OK) {
// Handle error
}
libdeflate:
size_t src_len = strlen(src_buf);
size_t compressed_size = libdeflate_deflate_compress_bound(compressor, src_len);
unsigned char *compressed_data = malloc(compressed_size);
size_t actual_size = libdeflate_deflate_compress(compressor, src_buf, src_len, compressed_data, compressed_size);
if (actual_size == 0) {
// Handle error
}
Both libraries provide compression functionality, but libdeflate offers more fine-grained control over compression levels and typically achieves better performance. miniz, however, provides a more comprehensive set of features in a single package, making it potentially more suitable for projects requiring additional functionality beyond basic compression and decompression.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Overview
libdeflate is a library for fast, whole-buffer DEFLATE-based compression and decompression.
The supported formats are:
- DEFLATE (raw)
- zlib (a.k.a. DEFLATE with a zlib wrapper)
- gzip (a.k.a. DEFLATE with a gzip wrapper)
libdeflate is heavily optimized. It is significantly faster than the zlib library, both for compression and decompression, and especially on x86 and ARM processors. In addition, libdeflate provides optional high compression modes that provide a better compression ratio than the zlib's "level 9".
libdeflate itself is a library. The following command-line programs which use this library are also included:
-
libdeflate-gzip
, a program which can be a drop-in replacement for standardgzip
under some circumstances. Note thatlibdeflate-gzip
has some limitations; it is provided for convenience and is not meant to be the main use case of libdeflate. It needs a lot of memory to process large files, and it omits support for some infrequently-used options of GNU gzip. -
benchmark
, a test program that does round-trip compression and decompression of the provided data, and measures the compression and decompression speed. It can use libdeflate, zlib, or a combination of the two. -
checksum
, a test program that checksums the provided data with Adler-32 or CRC-32, and optionally measures the speed. It can use libdeflate or zlib.
For the release notes, see the NEWS file.
Table of Contents
- Building
- API
- Bindings for other programming languages
- DEFLATE vs. zlib vs. gzip
- Compression levels
- Motivation
- License
Building
Using CMake
libdeflate uses CMake. It can be built just like any other CMake project, e.g. with:
cmake -B build && cmake --build build
By default the following targets are built:
- The static library (normally called
libdeflate.a
) - The shared library (normally called
libdeflate.so
) - The
libdeflate-gzip
program, including its aliaslibdeflate-gunzip
Besides the standard CMake build and installation options, there are some
libdeflate-specific build options. See CMakeLists.txt
for the list of these
options. To set an option, add -DOPTION=VALUE
to the cmake
command.
Prebuilt Windows binaries can be downloaded from https://github.com/ebiggers/libdeflate/releases.
Directly integrating the library sources
Although the official build system is CMake, care has been taken to keep the library source files compilable directly, without a prerequisite configuration step. Therefore, it is also fine to just add the library source files directly to your application, without using CMake.
You should compile both lib/*.c
and lib/*/*.c
. You don't need to worry
about excluding irrelevant architecture-specific code, as this is already
handled in the source files themselves using #ifdef
s.
If you are doing a freestanding build with -ffreestanding
, you must add
-DFREESTANDING
as well (matching what the CMakeLists.txt
does).
Supported compilers
- gcc: v4.9 and later
- clang: v3.9 and later (upstream), Xcode 8 and later (Apple)
- MSVC: Visual Studio 2015 and later
- Other compilers: any other C99-compatible compiler should work, though if your compiler pretends to be gcc, clang, or MSVC, it needs to be sufficiently compatible with the compiler it pretends to be.
The above are the minimums, but using a newer compiler allows more of the architecture-optimized code to be built. libdeflate is most heavily optimized for gcc and clang, but MSVC is supported fairly well now too.
The recommended optimization flag is -O2
, and the CMakeLists.txt
sets this
for release builds. -O3
is fine too, but often -O2
actually gives better
results. It's unnecessary to add flags such as -mavx2
or /arch:AVX2
, though
you can do so if you want to. Most of the relevant optimized functions are
built regardless of such flags, and appropriate ones are selected at runtime.
For the same reason, flags like -mno-avx2
do not cause all code using the
corresponding instruction set extension to be omitted from the binary; this is
working as intended due to the use of runtime CPU feature detection.
If using gcc, your gcc should always be paired with a binutils version that is not much older than itself, to avoid problems where the compiler generates instructions the assembler cannot assemble. Usually systems have their gcc and binutils paired properly, but rarely a mismatch can arise in cases such as the user installing a newer gcc version without a proper binutils alongside it. Since libdeflate v1.22, the CMake-based build system will detect incompatible binutils versions and disable some optimized code accordingly. In older versions of libdeflate, or if CMake is not being used, a too-old binutils can cause build errors like "no such instruction" from the assembler.
API
libdeflate has a simple API that is not zlib-compatible. You can create compressors and decompressors and use them to compress or decompress buffers. See libdeflate.h for details.
There is currently no support for streaming. This has been considered, but it
always significantly increases complexity and slows down fast paths.
Unfortunately, at this point it remains a future TODO. So: if your application
compresses data in "chunks", say, less than 1 MB in size, then libdeflate is a
great choice for you; that's what it's designed to do. This is perfect for
certain use cases such as transparent filesystem compression. But if your
application compresses large files as a single compressed stream, similarly to
the gzip
program, then libdeflate isn't for you.
Note that with chunk-based compression, you generally should have the uncompressed size of each chunk stored outside of the compressed data itself. This enables you to allocate an output buffer of the correct size without guessing. However, libdeflate's decompression routines do optionally provide the actual number of output bytes in case you need it.
Windows developers: note that the calling convention of libdeflate.dll is "cdecl". (libdeflate v1.4 through v1.12 used "stdcall" instead.)
Bindings for other programming languages
The libdeflate project itself only provides a C library. If you need to use libdeflate from a programming language other than C or C++, consider using the following bindings:
- C#: LibDeflate.NET
- Delphi: libdeflate-pas
- Go: go-libdeflate
- Java: libdeflate-java
- Julia: LibDeflate.jl
- Nim: libdeflate-nim
- Perl: Gzip::Libdeflate
- PHP: ext-libdeflate
- Python: deflate
- Ruby: libdeflate-ruby
- Rust: libdeflater
Note: these are third-party projects which haven't necessarily been vetted by the authors of libdeflate. Please direct all questions, bugs, and improvements for these bindings to their authors.
Also, unfortunately many of these bindings bundle or pin an old version of libdeflate. To avoid known issues in old versions and to improve performance, before using any of these bindings please ensure that the bundled or pinned version of libdeflate has been upgraded to the latest release.
DEFLATE vs. zlib vs. gzip
The DEFLATE format (rfc1951), the zlib format (rfc1950), and the gzip format (rfc1952) are commonly confused with each other as well as with the zlib software library, which actually supports all three formats. libdeflate (this library) also supports all three formats.
Briefly, DEFLATE is a raw compressed stream, whereas zlib and gzip are different wrappers for this stream. Both zlib and gzip include checksums, but gzip can include extra information such as the original filename. Generally, you should choose a format as follows:
- If you are compressing whole files with no subdivisions, similar to the
gzip
program, you probably should use the gzip format. - Otherwise, if you don't need the features of the gzip header and footer but do still want a checksum for corruption detection, you probably should use the zlib format.
- Otherwise, you probably should use raw DEFLATE. This is ideal if you don't need checksums, e.g. because they're simply not needed for your use case or because you already compute your own checksums that are stored separately from the compressed stream.
Note that gzip and zlib streams can be distinguished from each other based on their starting bytes, but this is not necessarily true of raw DEFLATE streams.
Compression levels
An often-underappreciated fact of compression formats such as DEFLATE is that there are an enormous number of different ways that a given input could be compressed. Different algorithms and different amounts of computation time will result in different compression ratios, while remaining equally compatible with the decompressor.
For this reason, the commonly used zlib library provides nine compression levels. Level 1 is the fastest but provides the worst compression; level 9 provides the best compression but is the slowest. It defaults to level 6. libdeflate uses this same design but is designed to improve on both zlib's performance and compression ratio at every compression level. In addition, libdeflate's levels go up to 12 to make room for a minimum-cost-path based algorithm (sometimes called "optimal parsing") that can significantly improve on zlib's compression ratio.
If you are using DEFLATE (or zlib, or gzip) in your application, you should test different levels to see which works best for your application.
Motivation
Despite DEFLATE's widespread use mainly through the zlib library, in the compression community this format from the early 1990s is often considered obsolete. And in a few significant ways, it is.
So why implement DEFLATE at all, instead of focusing entirely on bzip2/LZMA/xz/LZ4/LZX/ZSTD/Brotli/LZHAM/LZFSE/[insert cool new format here]?
To do something better, you need to understand what came before. And it turns out that most ideas from DEFLATE are still relevant. Many of the newer formats share a similar structure as DEFLATE, with different tweaks. The effects of trivial but very useful tweaks, such as increasing the sliding window size, are often confused with the effects of nontrivial but less useful tweaks. And actually, many of these formats are similar enough that common algorithms and optimizations (e.g. those dealing with LZ77 matchfinding) can be reused.
In addition, comparing compressors fairly is difficult because the performance of a compressor depends heavily on optimizations which are not intrinsic to the compression format itself. In this respect, the zlib library sometimes compares poorly to certain newer code because zlib is not well optimized for modern processors. libdeflate addresses this by providing an optimized DEFLATE implementation which can be used for benchmarking purposes. And, of course, real applications can use it as well.
License
libdeflate is MIT-licensed.
I am not aware of any patents or patent applications relevant to libdeflate.
Top Related Projects
A massively spiffy yet delicately unobtrusive compression library.
Zopfli Compression Algorithm is a compression library programmed in C to perform very good, but slow, deflate or zlib compression.
Zstandard - Fast real-time compression algorithm
Extremely Fast Compression algorithm
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot