robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

1,578

154

1,578

View on GitHub

Top Related Projects

folly

29,642

An open-source C++ library developed and used at Facebook.

Quick Overview

The robin-hood-hashing project is a C++ implementation of the Robin Hood hashing algorithm, which is a variation of the classic hash table data structure. This library provides a fast and efficient hash table implementation with a focus on performance and memory usage.

Pros

Performance: The Robin Hood hashing algorithm is known for its excellent performance, with constant-time average-case lookup, insertion, and deletion operations.
Memory Efficiency: The library is designed to be memory-efficient, using a compact data structure and avoiding the need for additional memory allocations.
Customizability: The library allows for customization of the hash function, key/value types, and other parameters to fit the specific needs of the project.
Concurrency Support: The library includes support for concurrent access to the hash table, making it suitable for use in multi-threaded applications.

Cons

Complexity: The Robin Hood hashing algorithm is more complex than the standard hash table implementation, which may make it more difficult to understand and maintain.
Limited Language Support: The library is currently only available in C++, which may limit its usefulness for projects in other programming languages.
Potential for Collisions: Like any hash table implementation, the Robin Hood hashing algorithm is susceptible to hash collisions, which can impact performance in certain scenarios.
Lack of Documentation: The project's documentation could be more comprehensive, which may make it more challenging for new users to get started.

Code Examples

Here are a few examples of how to use the robin-hood-hashing library:

Basic Hash Table Usage:

#include <robin_hood.h>

int main() {
    robin_hood::unordered_map<std::string, int> map;
    map["apple"] = 1;
    map["banana"] = 2;
    map["cherry"] = 3;

    std::cout << map["apple"] << std::endl; // Output: 1
    std::cout << map["banana"] << std::endl; // Output: 2
    std::cout << map["cherry"] << std::endl; // Output: 3
    return 0;
}

Customizing the Hash Function:

#include <robin_hood.h>

struct MyHasher {
    size_t operator()(const std::string& key) const {
        // Implement a custom hash function
        return std::hash<std::string>{}(key);
    }
};

int main() {
    robin_hood::unordered_map<std::string, int, MyHasher> map;
    map["apple"] = 1;
    map["banana"] = 2;
    map["cherry"] = 3;

    std::cout << map["apple"] << std::endl; // Output: 1
    std::cout << map["banana"] << std::endl; // Output: 2
    std::cout << map["cherry"] << std::endl; // Output: 3
    return 0;
}

Concurrent Access:

#include <robin_hood.h>
#include <thread>

int main() {
    robin_hood::unordered_map<std::string, int> map;

    std::thread t1([&map]() {
        map["apple"] = 1;
        map["banana"] = 2;
    });

    std::thread t2([&map]() {
        map["cherry"] = 3;
        map["durian"] = 4;
    });

    t1.join();
    t2.join();

    std::cout << map["apple"] << std::endl; // Output: 1
    std::cout << map["banana"] << std::endl; // Output: 2
    std::cout << map["cherry"] << std::endl; // Output: 3
    std::cout << map["durian"] << std::endl; // Output: 4
    return 0;
}

Getting Started

To get started with the robin-hood-hashing library, follow these steps:

Clone the repository:

git clone https://github.com/martinus/robin-hood-hashing.git

Include

Competitor Comparisons

jemalloc

10,308

Pros of jemalloc/jemalloc

Scalability: jemalloc is designed to be highly scalable, with support for large-scale multi-threaded applications.
Performance: jemalloc is known for its efficient memory management, leading to improved performance in many use cases.
Fragmentation Reduction: jemalloc employs various techniques to minimize memory fragmentation, which can be a common issue with other memory allocators.

Cons of jemalloc/jemalloc

Complexity: jemalloc is a relatively complex library, with a steep learning curve compared to simpler memory allocators.
Overhead: The advanced features of jemalloc can introduce some overhead, which may not be desirable in all scenarios.
Portability: While jemalloc is widely used, it may not be available or supported on all platforms, limiting its portability.

Code Comparison

Here's a brief code comparison between jemalloc/jemalloc and martinus/robin-hood-hashing:

jemalloc/jemalloc (memory allocation):

void* p = je_malloc(size);
if (p == NULL) {
    // Handle allocation failure
}
je_free(p);

martinus/robin-hood-hashing (hash table insertion):

robin_hood::unordered_map<std::string, int> map;
map.insert({"key", 42});

The jemalloc code demonstrates the basic usage of the je_malloc and je_free functions for dynamic memory allocation and deallocation. In contrast, the martinus/robin-hood-hashing code shows the insertion of a key-value pair into a Robin Hood hash table, which is a different data structure and use case compared to the memory allocator.

abseil-cpp

16,167

Abseil Common Libraries (C++)

Pros of Abseil-CPP

Comprehensive Utility Library: Abseil-CPP provides a wide range of utility functions and data structures, covering areas such as strings, containers, time, and more, making it a versatile tool for C++ development.
Cross-Platform Compatibility: Abseil-CPP is designed to be cross-platform, ensuring compatibility across various operating systems and compilers.
Active Development and Community: Abseil-CPP is actively maintained by the Abseil team and has a large and engaged community, ensuring ongoing support and improvements.

Cons of Abseil-CPP

Larger Codebase: Abseil-CPP has a more extensive codebase compared to Robin Hood Hashing, which may result in a larger binary size and increased compilation times.
Steeper Learning Curve: The breadth of features and functionality in Abseil-CPP may require more time and effort to learn and integrate into a project, especially for developers new to the library.
Potential Dependency Overhead: Depending on the specific requirements of a project, integrating Abseil-CPP may introduce additional dependencies, which can increase the complexity of the build process.

Code Comparison

Here's a brief code comparison between Abseil-CPP and Robin Hood Hashing:

Abseil-CPP (absl::flat_hash_map):

#include <absl/container/flat_hash_map.h>

absl::flat_hash_map<std::string, int> map;
map["apple"] = 1;
map["banana"] = 2;
int value = map["apple"];  // value = 1

Robin Hood Hashing (tsl::robin_hood::unordered_map):

#include <tsl/robin_hood.h>

tsl::robin_hood::unordered_map<std::string, int> map;
map["apple"] = 1;
map["banana"] = 2;
int value = map["apple"];  // value = 1

Both examples demonstrate the usage of a hash map data structure, with Abseil-CPP providing the absl::flat_hash_map and Robin Hood Hashing providing the tsl::robin_hood::unordered_map. The code is largely similar, showcasing the insertion and retrieval of key-value pairs.

folly

29,642

An open-source C++ library developed and used at Facebook.

Pros of Folly

Folly is a comprehensive library that provides a wide range of utilities and data structures, making it a powerful tool for building complex applications.
Folly is actively maintained and has a large community of contributors, ensuring regular updates and bug fixes.
Folly is used extensively by Facebook and other large tech companies, which means it has been thoroughly tested and optimized for performance.

Cons of Folly

Folly has a larger codebase and a steeper learning curve compared to Robin Hood Hashing, which may be a drawback for smaller projects or teams.
Folly may have a higher dependency footprint, which could be a concern for projects with strict resource constraints.

Code Comparison

Robin Hood Hashing:

template <typename Key, typename Value, typename Hash = std::hash<Key>, typename KeyEqual = std::equal_to<Key>>
class robin_hood_map : public robin_hood::unordered_map<Key, Value, Hash, KeyEqual> {
public:
    using base_type = robin_hood::unordered_map<Key, Value, Hash, KeyEqual>;
    using base_type::base_type;
};

Folly:

template <
    typename Key,
    typename Value,
    typename Hash = std::hash<Key>,
    typename KeyEqual = std::equal_to<Key>,
    typename Allocator = std::allocator<std::pair<const Key, Value>>>
class F14NodeMap : public folly::F14Map<Key, Value, Hash, KeyEqual, Allocator> {
public:
    using Base = folly::F14Map<Key, Value, Hash, KeyEqual, Allocator>;
    using Base::Base;
};

GSL

6,461

Guidelines Support Library

Pros of GSL

GSL (Guidelines Support Library) is a collection of C++ core guidelines, providing a set of types, functions, and extensions to the C++ Standard Library.
The library is actively maintained and supported by Microsoft, ensuring regular updates and bug fixes.
GSL provides a comprehensive set of utilities and tools to help developers write safer and more robust C++ code.

Cons of GSL

The library may have a steeper learning curve compared to Robin Hood Hashing, as it covers a broader range of functionality.
The integration of GSL into existing codebases may require more effort than a more focused library like Robin Hood Hashing.
The GSL library may have a larger footprint and dependencies compared to the more lightweight Robin Hood Hashing.

Code Comparison

Robin Hood Hashing:

template <typename Key, typename Value, typename Hash = std::hash<Key>, typename KeyEqual = std::equal_to<Key>>
class robin_hood_map : public robin_hood::unordered_map<Key, Value, Hash, KeyEqual> {
public:
    using base_type = robin_hood::unordered_map<Key, Value, Hash, KeyEqual>;
    using base_type::base_type;
};

GSL:

template <typename T, std::size_t Extent = dynamic_extent>
class span {
public:
    constexpr span() noexcept = default;
    constexpr span(pointer ptr, index_type count) noexcept;
    template <std::size_t N>
    constexpr span(element_type (&arr)[N]) noexcept;
};

benchmark

9,646

A microbenchmark support library

Pros of Google Benchmark

Provides a comprehensive set of tools for benchmarking C++ code, including support for different types of benchmarks (time, memory, etc.).
Offers a user-friendly API for defining and running benchmarks, making it easy to integrate into existing projects.
Generates detailed reports with statistics and visualizations, helping developers analyze the performance of their code.

Cons of Google Benchmark

Requires more setup and configuration compared to Robin Hood Hashing, which is a single-header library.
May have a higher learning curve for developers who are not familiar with the Google Benchmark framework.
Adds more dependencies to a project, which can increase the overall project size and complexity.

Code Comparison

Google Benchmark:

#include <benchmark/benchmark.h>

static void BM_StringCreation(benchmark::State& state) {
  for (auto _ : state)
    std::string empty_string;
}
BENCHMARK(BM_StringCreation);

BENCHMARK_MAIN();

Robin Hood Hashing:

#include <robin_hood.h>

int main() {
  robin_hood::unordered_map<std::string, int> map;
  map["hello"] = 42;
  int value = map["hello"];
  return 0;
}

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

âµ robin_hood unordered map & set

NOTE: Unfortunately I do not have time to continue development for this hashmap. I have a worthy successor though, please head over to: ankerl::unordered_dense::{map, set}

I have spent a lot of time developing and improving it robin_hood, and it works quite well for most use cases. I won't make any updates to this code any more, unless they are PRs for critical bug fixes.

robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map / std::unordered_set which is both faster and more memory efficient for real-world use cases.

Installation & Usage

Direct Inclusion

Add robin_hood.h to your C++ project.
Use robin_hood::unordered_map instead of std::unordered_map
Use robin_hood::unordered_set instead of std::unordered_set

Conan, the C/C++ Package Manager

Setup your CMakeLists.txt (see Conan documentation on how to use MSBuild, Meson and others) like this:

project(myproject CXX)

add_executable(${PROJECT_NAME} main.cpp)

include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake) # Include Conan-generated file
conan_basic_setup(TARGETS) # Introduce Conan-generated targets

target_link_libraries(${PROJECT_NAME} CONAN_PKG::robin-hood-hashing)

Create conanfile.txt in your source dir (don't forget to update the version)
```
[requires]
robin-hood-hashing/3.11.5

[generators]
cmake
```
Install and run Conan, then build your project as always:
```
pip install conan
mkdir build
cd build
conan install ../ --build=missing
cmake ../
cmake --build .
```
The robin-hood-hashing package in Conan is kept up to date by Conan contributors. If the version is out of date, please create an issue or pull request on the conan-center-index repository.

Benchmarks

Please see extensive benchmarks at Hashmaps Benchmarks. In short: robin_hood is always among the fastest maps and uses far less memory than std::unordered_map.

Design Choices

Two memory layouts. Data is either stored in a flat array, or with node indirection. Access for unordered_flat_map is extremely fast due to no indirection, but references to elements are not stable. It also causes allocation spikes when the map resizes, and will need plenty of memory for large objects. Node based map has stable references & pointers (NOT iterators! Similar to std::unordered_map) and uses const Key in the pair. It is a bit slower due to indirection. The choice is yours; you can either use robin_hood::unordered_flat_map or robin_hood::unordered_node_map directly. If you use robin_hood::unordered_map It tries to choose the layout that seems appropriate for your data.
Custom allocator. Node based representation has a custom bulk allocator that tries to make few memory allocations. All allocated memory is reused, so there won't be any allocation spikes. It's very fast as well.
Optimized hash. robin_hood::hash has custom implementations for integer types and for std::string that are very fast and falls back to std::hash for everything else.
Depends on good Hashing. For a really bad hash the performance will not only degrade like in std::unordered_map, the map will simply fail with an std::overflow_error. In practice, when using the standard robin_hood::hash, I have never seen this happening.

License

Licensed under the MIT License. See the LICENSE file for details.

by martinus

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of jemalloc/jemalloc

Cons of jemalloc/jemalloc

Code Comparison

Pros of Abseil-CPP

Cons of Abseil-CPP

Code Comparison

Pros of Folly

Cons of Folly

Code Comparison

Pros of GSL

Cons of GSL

Code Comparison

Pros of Google Benchmark

Cons of Google Benchmark

Code Comparison

Convert designs to code with AI

README

âµ robin_hood unordered map & set

Installation & Usage

Direct Inclusion

Conan, the C/C++ Package Manager

Benchmarks

Design Choices

License

Top Related Projects

Convert designs to code with AI

âµ robin_hood unordered map & set