snmalloc

Message passing based allocator

1,561

108

1,561

View on GitHub

Top Related Projects

rpmalloc

2,143

Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C

mimalloc

10,496

mimalloc is a compact general purpose allocator with excellent performance.

Quick Overview

snmalloc is a high-performance memory allocator developed by Microsoft Research. It is designed to be fast, memory-efficient, and scalable for multi-threaded applications. snmalloc uses novel techniques to minimize contention and improve cache locality, making it particularly suitable for large-scale server applications.

Pros

Excellent performance in multi-threaded environments
Low memory fragmentation and efficient memory usage
Thread-local caching for improved speed and reduced contention
Easy to integrate into existing projects

Cons

May not be optimal for small, single-threaded applications
Requires C++17 or later, which might limit compatibility with older codebases
Documentation could be more comprehensive for advanced usage scenarios
Relatively new compared to some other established allocators

Code Examples

Basic usage of snmalloc:

#include <snmalloc.h>

int main() {
    void* ptr = snmalloc::ThreadAlloc::get().alloc(1024);
    // Use the allocated memory
    snmalloc::ThreadAlloc::get().dealloc(ptr);
    return 0;
}

Using snmalloc with custom alignment:

#include <snmalloc.h>

int main() {
    constexpr size_t alignment = 64;
    void* ptr = snmalloc::ThreadAlloc::get().alloc(1024, alignment);
    // Use the aligned memory
    snmalloc::ThreadAlloc::get().dealloc(ptr);
    return 0;
}

Allocating an array of objects:

#include <snmalloc.h>
#include <new>

struct MyStruct { int x; double y; };

int main() {
    size_t count = 100;
    MyStruct* arr = static_cast<MyStruct*>(snmalloc::ThreadAlloc::get().alloc(sizeof(MyStruct) * count));
    
    for (size_t i = 0; i < count; i++) {
        new (&arr[i]) MyStruct();
    }
    
    // Use the array
    
    for (size_t i = 0; i < count; i++) {
        arr[i].~MyStruct();
    }
    
    snmalloc::ThreadAlloc::get().dealloc(arr);
    return 0;
}

Getting Started

To use snmalloc in your project:

Clone the repository:

git clone https://github.com/microsoft/snmalloc.git

Add snmalloc as a subdirectory in your CMakeLists.txt:
```
add_subdirectory(path/to/snmalloc)
```

Link your target with snmalloc:

target_link_libraries(your_target snmalloc)

Include snmalloc in your C++ code:
```
#include <snmalloc.h>
```

Compile your project with C++17 or later:

g++ -std=c++17 your_code.cpp -o your_program

Competitor Comparisons

rpmalloc

2,143

Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C

Pros of rpmalloc

Simpler implementation, making it easier to understand and maintain
Lower memory overhead for small allocations
Better performance on some benchmarks, particularly for small allocations

Cons of rpmalloc

Less focus on security features compared to snmalloc
May not perform as well for large allocations or high-concurrency scenarios
Less extensive documentation and community support

Code Comparison

rpmalloc:

static void* _rpmalloc_allocate_large(size_t size) {
    size_t total_size = size + SPAN_HEADER_SIZE;
    size_t num_spans = total_size >> _memory_span_size_shift;
    if (total_size & (_memory_span_size - 1))
        ++num_spans;
    size_t span_count = (size_t)num_spans;
    span_t* span = _rpmalloc_heap_allocate_spans(heap, span_count);
    return pointer_offset(span, SPAN_HEADER_SIZE);
}

snmalloc:

void* Alloc::alloc_large(size_t size)
{
  size = round_size(size);
  auto sizeclass = size_to_sizeclass(size);
  auto rsize = sizeclass_to_size(sizeclass);
  auto span = large_allocator.alloc(rsize);
  if (span == nullptr)
    return nullptr;
  return span->start;
}

Both allocators use similar approaches for large allocations, but snmalloc's implementation is more abstracted and uses higher-level constructs.

jemalloc

9,465

Pros of jemalloc

Widely adopted and battle-tested in large-scale production environments
Extensive performance tuning options and customization capabilities
Strong support for multi-threaded applications and scalability

Cons of jemalloc

More complex codebase, potentially harder to maintain or modify
Higher memory overhead in some scenarios due to its sophisticated management structures
May require more fine-tuning to achieve optimal performance in specific use cases

Code Comparison

jemalloc:

void *ptr = malloc(size);
free(ptr);

snmalloc:

void *ptr = snmalloc::ThreadAlloc::get().alloc(size);
snmalloc::ThreadAlloc::get().dealloc(ptr);

Key Differences

snmalloc is designed with a focus on security and memory isolation
jemalloc offers more extensive statistics and debugging features
snmalloc has a simpler codebase, potentially easier to integrate and maintain
jemalloc provides better support for fragmentation reduction in long-running applications

Both allocators aim to improve performance and memory efficiency, but they take different approaches. snmalloc emphasizes security and simplicity, while jemalloc focuses on scalability and extensive customization options. The choice between them depends on specific project requirements and use cases.

tcmalloc

4,311

Pros of tcmalloc

Highly optimized for multi-threaded applications
Extensive performance profiling and debugging tools
Wider adoption and longer history of use in production environments

Cons of tcmalloc

More complex implementation, potentially harder to maintain
May have higher memory overhead for small allocations
Less focus on security features compared to snmalloc

Code Comparison

snmalloc:

void* alloc = snmalloc::ThreadAlloc::get().alloc(size);
snmalloc::ThreadAlloc::get().dealloc(alloc);

tcmalloc:

void* alloc = tc_malloc(size);
tc_free(alloc);

Both libraries provide similar APIs for allocation and deallocation, but snmalloc uses a thread-local allocator object, while tcmalloc uses global functions.

snmalloc focuses on simplicity and security, with features like randomized allocation and memory zeroing. tcmalloc emphasizes performance and scalability, particularly for multi-threaded applications.

tcmalloc offers more advanced profiling and debugging tools, which can be beneficial for large-scale projects. However, snmalloc's simpler design may make it easier to integrate and maintain in smaller projects or those with specific security requirements.

mimalloc

10,496

mimalloc is a compact general purpose allocator with excellent performance.

Pros of mimalloc

Higher performance in multi-threaded scenarios
More extensive documentation and benchmarks
Wider adoption and community support

Cons of mimalloc

Slightly larger memory footprint
More complex implementation, potentially harder to maintain

Code comparison

mimalloc:

#include <mimalloc.h>

void* p = mi_malloc(sizeof(int));
mi_free(p);

snmalloc:

#include <snmalloc.h>

void* p = snmalloc::malloc(sizeof(int));
snmalloc::free(p);

Key differences

mimalloc uses a global API, while snmalloc requires namespace usage
mimalloc focuses on performance optimizations for common allocation patterns
snmalloc emphasizes memory isolation and security features

Performance

mimalloc generally outperforms snmalloc in multi-threaded scenarios
snmalloc may have an edge in single-threaded applications

Memory usage

snmalloc typically has a smaller memory footprint
mimalloc trades some memory overhead for improved performance

Compatibility

Both allocators can be used as drop-in replacements for standard malloc
mimalloc has broader platform support and more extensive testing

Community and development

mimalloc has a larger user base and more frequent updates
snmalloc focuses on specific use cases and security features

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

snmalloc

snmalloc is a high-performance allocator. snmalloc can be used directly in a project as a header-only C++ library, it can be LD_PRELOADed on Elf platforms (e.g. Linux, BSD), and there is a crate to use it from Rust.

Its key design features are:

Memory that is freed by the same thread that allocated it does not require any synchronising operations.
Freeing memory in a different thread to initially allocated it, does not take any locks and instead uses a novel message passing scheme to return the memory to the original allocator, where it is recycled. This enables 1000s of remote deallocations to be performed with only a single atomic operation enabling great scaling with core count.
The allocator uses large ranges of pages to reduce the amount of meta-data required.
The fast paths are highly optimised with just two branches on the fast path for malloc (On Linux compiled with Clang).
The platform dependencies are abstracted away to enable porting to other platforms.

snmalloc's design is particular well suited to the following two difficult scenarios that can be problematic for other allocators:

Allocations on one thread are freed by a different thread
Deallocations occur in large batches

Both of these can cause massive reductions in performance of other allocators, but do not for snmalloc.

The implementation of snmalloc has evolved significantly since the initial paper. The mechanism for returning memory to remote threads has remained, but most of the meta-data layout has changed. We recommend you read docs/security to find out about the current design, and if you want to dive into the code docs/AddressSpace.md provides a good overview of the allocation and deallocation paths.

Hardening

There is a hardened version of snmalloc, it contains

Randomisation of the allocations' relative locations,
Most meta-data is stored separately from allocations, and is protected with guard pages,
All in-band meta-data is protected with a novel encoding that can detect corruption, and
Provides a memcpy that automatically checks the bounds relative to the underlying malloc.

A more comprehensive write up is in docs/security.

Further documentation

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot