Convert Figma logo to code with AI

rurban logosmhasher

Hash function quality and speed tests

1,876
180
1,876
37

Top Related Projects

9,261

Extremely fast non-cryptographic hash algorithm

Quick Overview

SMHasher is a test suite designed to test the quality of hash functions. It provides a comprehensive set of tests to evaluate various properties of hash functions, including distribution, avalanche effect, and collision resistance. The project is maintained by Reini Urban and is a fork of the original SMHasher by Austin Appleby.

Pros

  • Comprehensive test suite for evaluating hash functions
  • Supports a wide range of hash algorithms
  • Actively maintained and updated
  • Useful for both cryptographic and non-cryptographic hash functions

Cons

  • Complex setup and compilation process
  • Limited documentation for new users
  • Requires understanding of hash function properties to interpret results
  • Resource-intensive for running full test suite on multiple hash functions

Code Examples

// Example 1: Using SMHasher to test a custom hash function
#include "Platform.h"
#include "Hashes.h"
#include "KeysetTest.h"
#include "SpeedTest.h"
#include "AvalancheTest.h"

void CustomHash(const void * key, int len, uint32_t seed, void * out) {
    // Your custom hash implementation here
}

int main(int argc, char ** argv) {
    HashInfo hi;
    hi.hash = CustomHash;
    hi.name = "CustomHash";
    hi.hashbits = 32;
    hi.verification = 0;  // Set this to the expected verification value

    test_hash(&hi);
    return 0;
}
// Example 2: Running specific tests on a hash function
#include "Platform.h"
#include "Hashes.h"
#include "KeysetTest.h"

int main(int argc, char ** argv) {
    HashInfo hi;
    hi.hash = MurmurHash3_x86_32;
    hi.name = "MurmurHash3_x86_32";
    hi.hashbits = 32;

    bool result = TinyKeyTest(&hi, 100000);
    printf("TinyKeyTest result: %s\n", result ? "PASS" : "FAIL");

    return 0;
}
// Example 3: Comparing speed of multiple hash functions
#include "Platform.h"
#include "Hashes.h"
#include "SpeedTest.h"

int main(int argc, char ** argv) {
    HashInfo hi1, hi2;
    hi1.hash = MurmurHash3_x86_32;
    hi1.name = "MurmurHash3_x86_32";
    hi1.hashbits = 32;

    hi2.hash = FNV1a;
    hi2.name = "FNV1a";
    hi2.hashbits = 32;

    BulkSpeedTest(&hi1, 1024);
    BulkSpeedTest(&hi2, 1024);

    return 0;
}

Getting Started

  1. Clone the repository:

    git clone https://github.com/rurban/smhasher.git
    
  2. Build the project:

    cd smhasher
    cmake .
    make
    
  3. Run the test suite:

    ./SMHasher
    
  4. To test a specific hash function:

    ./SMHasher MurmurHash3_x86_32
    

Competitor Comparisons

9,261

Extremely fast non-cryptographic hash algorithm

Pros of xxHash

  • Faster performance, especially for short inputs
  • Simpler implementation, focusing on speed and efficiency
  • Actively maintained with regular updates and improvements

Cons of xxHash

  • Limited to non-cryptographic hash functions
  • Fewer hash algorithm implementations compared to SMHasher

Code Comparison

xxHash (XXH64 function):

XXH_FORCE_INLINE XXH64_hash_t
XXH64_round(XXH64_hash_t acc, XXH64_hash_t input)
{
    acc += input * PRIME64_2;
    acc  = XXH_rotl64(acc, 31);
    acc *= PRIME64_1;
    return acc;
}

SMHasher (MurmurHash3 function):

inline uint64_t fmix64 ( uint64_t k )
{
  k ^= k >> 33;
  k *= BIG_CONSTANT(0xff51afd7ed558ccd);
  k ^= k >> 33;
  k *= BIG_CONSTANT(0xc4ceb9fe1a85ec53);
  k ^= k >> 33;
  return k;
}

Summary

xxHash focuses on speed and simplicity, making it ideal for non-cryptographic hashing tasks. SMHasher provides a broader range of hash algorithms and testing tools, making it more suitable for hash function analysis and comparison. Both projects have their strengths, with xxHash excelling in performance and SMHasher offering more comprehensive testing capabilities.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

SMhasher

Linux Build status Windows Build status FreeBSD Build status

Hash functionMiB/seccycl./hashcycl./mapsizeQuality problems
donothing3211536809.554.00-13bad seed 0, test NOP
donothing6412269832.974.00-13bad seed 0, test NOP
donothing12811961841.484.08-13bad seed 0, test NOP
NOP_OAAT_read6411652662.9714.00-47test NOP
BadHash794.6872.80-47bad seed 0, test FAIL
sumhash10184.6729.40-363bad seed 0, test FAIL
sumhash3244755.8723.46-863UB, test FAIL
multiply_shift8286.6325.39209.06 (2)345bad seeds & 0xfffffff0, fails most tests
pair_multiply_shift5728.0539.67218.48 (12)609fails most tests
--------------------------
crc32369.88142.33396.81 (101)422insecure, 8590x collisions, distrib, PerlinNoise
md5_32359.43638.92865.21 (6)4419
md5_64360.20640.38869.55 (6)4419
md5-128344.63669.59856.34 (5)4419
sha1_32353.031385.801759.94 (5)5126Sanity, Cyclic low32, 36.6% distrib
sha1_64353.031385.801759.94 (5)5126Sanity, Cyclic low32, 36.6% distrib
sha1-160364.951470.551794.16 (13)5126Comb/Cyclic low32
sha2-224147.131354.811589.92 (12)Comb low32
sha2-224_64147.601360.101620.93 (13)Cyclic low32
sha2-256147.801374.901606.06 (16)
sha2-256_64148.011376.341624.71 (16)
sha1ni1632.88170.77379.84 (5)989insecure,sanity, Permutation, Zeroes, machine-specific
sha1ni_321583.50172.17387.70 (4)989machine-specific
sha2ni-2561556.66180.75393.79 (5)4241insecure,sanity, Permutation, Zeroes, machine-specific
sha2ni-256_641508.82184.89393.07 (6)4241Zeroes, machine-specific
blake3_c1298.04354.25563.63 (4)no 32bit portability
rmd128294.42712.67930.65 (4)
rmd160202.161045.791287.74 (16)Cyclic hi32
rmd256366.21615.39830.04 (7)
edonr224863.77304.76496.66 (3)
edonr256870.97296.40517.82 (6)
blake2s-128295.30698.091059.24 (51)
blake2s-160215.011026.741239.54 (11)
blake2s-224207.061063.861236.50 (20)
blake2s-256215.281014.881230.38 (28)
blake2s-256_64211.521044.221228.43 (8)
blake2b-160356.081236.841458.15 (12)
blake2b-224356.591228.501425.87 (16)
blake2b-256355.971232.221443.31 (19)
blake2b-256_64356.971222.761435.03 (9)
asconhashv12144.98885.021324.23 (38)4341
asconhashv12_64159.33402.54473.15 (4)6490
sha3-256100.583877.184159.79 (37)PerlinNoise
sha3-256_64100.573909.004174.63 (16)PerlinNoise
hasshe22879.9362.47272.34 (2)445Permutation,TwoBytes,Zeroes,Seed
poly_1_mersenne1278.3763.42244.41 (1)479fails most tests
poly_2_mersenne1391.9069.97255.44 (3)479
poly_3_mersenne1320.1580.81263.12 (2)479
poly_4_mersenne1393.9389.07262.97 (2)479
tabulation325819.4040.16233.00 (3)848collisions
tabulation7407.9439.83240.25 (4)554
crc32_hw5537.7940.80225.65 (3)653insecure, 100% bias, collisions, distrib, BIC, machine-specific (SSE4.2/NEON)
crc32_hw17626.1750.21228.50 (2)671insecure, 100% bias, collisions, distrib, BIC, machine-specific (x86 SSE4.2)
crc64_hw5579.1740.87202.19 (1)652insecure, 100% bias, collisions, distrib, BIC, machine-specific (SSE4.2/NEON)
crc32_pclmul7963.20106.02567.70 (3)insecure, 100% bias, collisions, distrib, BIC, machine-specific (x86 SSE4.2+PCLMUL)
crc64_jones11673.6783.00249.16 (2)insecure, 100% bias, collisions, distrib, BIC, machine-specific
crc64_jones22968.07314.60253.01 (3)insecure, 100% bias, collisions, distrib, BIC, machine-specific
crc64_jones33398.80302.59258.16 (8)insecure, 100% bias, collisions, distrib, BIC, machine-specific
crc64_jones3512.6680.70251.40 (2)insecure, 100% bias, collisions, distrib, BIC, machine-specific
o1hash11530366.9218.40206.94 (1)101insecure, no seed, zeros, fails all tests
fibonacci15409.6222.67872.83 (3)1692UB, zeros, fails all tests
FNV1a754.7075.28226.30 (2)204bad seed, zeros, fails all tests
FNV1A_Totenschiff6166.8526.60235.71 (2)270UB, zeros, fails all tests
FNV1A_Pippip_Yurii6115.2928.03233.52 (2)147UB, sanity, fails all tests
FNV1a_YT13481.7030.67233.71 (7)321bad seed, UB, fails all tests
FNV25630.8732.09207.08 (1)278fails all tests
FNV64747.7074.18189.18 (1)79fails all tests
FNV128408.59130.82299.47 (20)171fails all tests
k-hash322227.9053.85255.78 (2)808insecure, zeros, UB, bad seeds, fails all tests
k-hash642492.6648.18242.58 (2)609insecure, zeros, UB, bad seeds, fails all tests
fletcher215410.1220.72345.55 (5)248bad seed 0, UB, fails all tests
fletcher415603.6821.24320.09 (2)371bad seed 0, UB, fails all tests
bernstein1032.0059.04225.09 (3)41bad seed 0, fails all tests
sdbm772.0171.67220.09 (2)41bad seed 0, fails all tests
x17765.0175.61225.81 (2)7999.98% bias, fails all tests
libiberty618.3986.58220.95 (2)37insecure, 100% bias, fails all tests, bad seed
gcc612.1088.44224.62 (2)39insecure, 100% bias, fails all tests, bad seed
JenkinsOOAT615.60111.23251.08 (2)153bad seed 0, 53.5% bias, fails all tests
JenkinsOOAT_perl631.7993.13232.44 (1)65bad seed 0, 1.5-11.5% bias, 7.2x collisions, BIC, LongNeighbors
MicroOAAT730.8978.25236.07 (3)68100% bias, distrib, BIC
pearsonhash64439.85123.07213.68 (1)Avalanche, Seed, SSSE3 only. broken MSVC
pearsonhash128438.49123.81212.89 (2)Avalanche, Seed, SSSE3 only. broken MSVC
pearsonhash256440.63120.72224.63 (2)Avalanche, Seed, SSSE3 only. broken MSVC
VHASH_3213084.3765.45280.14 (2)1231sanity, Seed, MomentChi2
VHASH_6413217.6464.90270.04 (2)1231sanity, Seed, Sparse
farsh3227583.8965.65266.64 (2)944insecure: AppendedZeroes, collisions+bias, MomentChi2, LongNeighbors
farsh6413558.69113.74327.26 (3)944insecure: AppendedZeroes, collisions+bias, MomentChi2, LongNeighbors
farsh1287055.39229.42375.57 (4)944insecure: AppendedZeroes, collisions+bias, permut,combin,2bytes,zeroes,PerlinNoise
farsh2563466.26444.74610.23 (4)944insecure: AppendedZeroes, collisions+bias, permut,combin,2bytes,zeroes,PerlinNoise
jodyhash321762.4642.66236.09 (2)102bias, collisions, distr, BIC LongNeighbors
jodyhash644861.8444.05234.35 (2)118bias, collisions, distr, BIC, LongNeighbors
lookup32474.0940.19238.22 (3)341UB, 28% bias, collisions, 30% distr, BIC
superfast2085.7949.89230.76 (3)210UB, bad seed 0, 91% bias, 5273.01x collisions, 37% distr, BIC
MurmurOAAT513.75105.23244.81 (4)47bad seed 0, collisions, 99.998% distr., BIC, LongNeighbors
Crap83081.0437.15234.94 (2)342UB, 2.42% bias, collisions, 2% distrib
Murmur11955.3648.84236.25 (2)358UB, 1 bad seed, 511x collisions, Diff, BIC
Murmur23082.0341.62250.72 (4)358UB, 1 bad seed, 1.7% bias, 81x coll, 1.7% distrib, BIC
Murmur2A2850.4046.60284.58 (13)407UB, 1 bad seed, 12.7% bias, LongNeighbors
Murmur2B6039.9638.70212.23 (1)433UB, 1.8% bias, collisions, 3.4% distrib, BIC
Murmur2C3802.6849.82220.13 (2)444UB, 2^32 bad seeds, 91% bias, collisions, distr, BIC, LongNeighbors
Murmur3A3027.3048.99234.49 (2)351UB, 1 bad seed, Moment Chi2 69
PMurHash323001.4448.99240.35 (3)18621 bad seed, Moment Chi2 69
Murmur3C4824.9557.39243.91 (2)859UB, LongNeighbors, Text, DiffDist
mirhash32low6168.0438.35234.32 (2)1112UB, 4 bad seeds, Cyclic, LongNeighbors, machine-specific (32/64 differs)
PMPML_326904.3044.25233.59 (2)1084Avalanche >512, unseeded: Seed, BIC, MomentChi2, PerlinNoise
PMPML_6410030.6749.22239.07 (5)1305unseeded: Seed, MomentChi2, BIC
xxHash326064.3748.86234.27 (3)738LongNeighbors, collisions with 4bit diff, MomentChi2 220
metrohash6414209.1440.85225.16 (2)624UB, LongNeighbors, BIC
metrohash64_114495.3040.83213.00 (2)624UB, LongNeighbors, BIC, MomentChi2
metrohash64crc_18010.9044.43213.94 (2)632UB, Cyclic 8/8 byte, DiffDist, BIC, MomentChi2, machine-specific (SSE4.2/NEON)
metrohash64crc_27939.6544.84222.01 (2)632UB, Cyclic 8/8 byte, DiffDist, BIC, machine-specific (SSE4.2/NEON)
cmetrohash64_1o14678.5640.33216.44 (2)3506UB, LongNeighbors, BIC, MomentChi2
cmetrohash64_114332.7541.04216.05 (2)652UB, LongNeighbors, BIC, MomentChi2
City64noSeed14023.2733.37223.59 (2)1038Avalanche, Sparse, TwoBytes, MomentChi2, Seed
City6414390.0946.69231.99 (2)1120Sparse, TwoBytes
t1ha1_64le13425.0331.37221.97 (2)517Avalanche
t1ha1_64be12002.5031.57226.52 (2)555Avalanche
t1ha0_32le7276.1649.11236.39 (2)509Sparse, LongNeighbors
t1ha0_32be6860.8750.16241.26 (2)533Sparse, LongNeighbors
t1ha0_aes_avx127881.5236.78227.13 (2)533Sparse, LongNeighbors
t1ha2_stream13673.2281.12263.88 (3)1665Sparse, Permutation, LongNeighbors
t1ha2_stream12813913.4394.60296.15 (4)1665Sparse, Permutation, LongNeighbors
aesnihash5365.6057.21255.87 (3)1209fails many tests, machine-specific (x64 AES-NI)
aesni-hash-peterrk29107.7328.86217.57 (1)machine-specific (x64 AES-NI)
falkhash52401.48122.70316.79 (4)264Sparse, LongNeighbors, machine-specific (x64 AES-NI)
MeowHash29969.8164.90273.79 (8)1764Sparse, invertible, machine-specific (x64 AES-NI)
MeowHash64low29438.4563.76269.41 (4)1764Sparse, invertible, machine-specific (x64 AES-NI)
MeowHash32low30562.5463.77283.26 (3)1764Sparse, invertible, machine-specific (x64 AES-NI)
--------------------------------------
tifuhash_6435.601679.521212.75 (15)276Cyclic low32
floppsyhash35.721868.921411.07 (7)623
beamsplitter789.22682.451150.33 (26)4203UB
discohash14152.62202.14414.34 (4)1294
discohash1-1284064.39231.06430.94 (6)1294
discohash24026.84204.37408.13 (6)1294
discohash2-1284153.48226.54429.24 (4)1294
discoNONG3661.23397.81651.23 (54)bad seeds
chaskey1142.38113.98288.04 (1)1609PerlinNoise
SipHash953.51147.32332.02 (5)1071
HalfSipHash1128.8580.48260.42 (2)700zeroes
GoodOAAT744.6787.23243.36 (2)237
pearsonbhash641749.9899.84257.11 (2)683
pearsonbhash1281656.68106.78276.92 (3)1134
pearsonbhash2561418.61123.68301.26 (3)844
prvhash64_64m3107.7747.31236.76 (2)349
prvhash64_643057.7947.83244.16 (2)384
prvhash64_1283340.8767.90264.44 (2)718
prvhash64s_646724.41266.09438.90 (3)2640
prvhash64s_1286703.66326.46508.67 (3)2799
SipHash131803.96107.93304.05 (3)7780.9% bias
TSip4330.8752.43245.31 (2)519!msvc
seahash8240.5858.61260.35 (1)871PerlinNoise, !msvc
seahash32low8245.9158.61264.07 (2)871PerlinNoise 32, !msvc
clhash22932.3667.22262.78 (2)1809PerlinNoise, machine-specific (x64 SSE4.2)
HighwayHash646242.5899.55248.41 (3)2546
Murmur3F7104.0852.69216.34 (2)699UB
MUM9631.2634.78219.14 (2)1912UB, too many bad seeds, machine-specific (32/64 differs)
MUMlow8532.1735.95238.37 (2)1912UB, 5 bad seeds
xmsx322105.0345.35238.92 (3)1922 bad seeds
mirhash5793.8538.15212.91 (2)1112UB, 2^36 bad seeds, LongNeighbors, machine-specific (32/64 differs)
mirhashstrict3587.7949.85219.49 (2)1112
mirhashstrict32low3642.7750.16233.34 (3)11121 bad seed, MomentChi2 9
fasthash326107.0540.49237.83 (2)566UB
fasthash645600.1138.04219.71 (2)509UB
aesni31185.9829.45226.75 (2)519machine-specific (x64 AES-NI)
aesni-low31027.3929.47232.54 (2)519machine-specific (x64 AES-NI)
mx39332.9947.07221.61 (2)734UB
pengyhash13347.1774.79278.74 (3)421
City325745.8352.44242.69 (2)1319
City64low13119.1747.92251.81 (2)1120
City12814472.2088.71285.61 (10)1841
CityCrc12812255.0589.25281.85 (2)295
CityCrc25612428.45164.93358.73 (2)
FarmHash3222136.1446.75258.73 (2)11489machine-specific (x64 SSE4/AVX)
FarmHash6412929.5446.82238.50 (2)3758
FarmHash12814454.1268.09254.04 (2)163
farmhash32_c22150.1346.08251.43 (3)762machine-specific (x64 SSE4/AVX)
farmhash64_c12925.7346.76239.47 (2)3688
farmhash128_c14284.6968.80256.66 (3)1890
metrohash64_214359.3941.09215.53 (2)627UB, LongNeighbors
cmetrohash64_214498.2040.87219.65 (2)655LongNeighbors
metrohash12815847.6372.33266.54 (2)773UB, LongNeighbors
metrohash128_115556.4073.49259.05 (2)773UB, LongNeighbors
metrohash128_215408.5674.36259.57 (2)773UB, LongNeighbors
metrohash128crc_18182.0277.38256.80 (2)723UB, machine-specific (SSE4.2/NEON)
metrohash128crc_27996.3378.81262.53 (2)723UB, machine-specific (SSE4.2/NEON)
xxHash6411176.7249.67230.24 (6)1999
Spooky3213623.8455.45248.11 (3)2221UB
Spooky6413400.9555.06244.86 (3)2221UB
Spooky12813621.0059.05235.42 (2)2221UB
SpookyV2_3213626.5156.19256.57 (19)2069
SpookyV2_6413401.1356.18246.33 (11)2069
SpookyV2_12812252.6458.94235.46 (2)2069
ahash649862.6227.32181.68 (1)412rust
xxh320853.7429.46220.07 (2)744DiffDist bit 7 w. 36 bits, BIC
xxh3low20568.9629.98243.82 (5)756
xxh12819259.9931.58228.84 (2)1012
xxh128low19555.8830.85227.83 (2)1012
t1ha2_atonce14275.6037.02224.12 (2)541Zeroes low3
t1ha2_atonce12814059.8556.44251.53 (3)613LongNeighbors
t1ha0_aes_noavx27335.4737.34230.59 (2)925LongNeighbors, machine-specific (x86 AES-NI)
t1ha0_aes_avx127881.5236.78227.13 (2)843LongNeighbors, machine-specific (x64 AVX)
t1ha0_aes_avx256581.8336.51223.59 (1)792LongNeighbors, machine-specific (x64 AVX2)
wyhash322532.8948.40484.57 (1)4264 bad and broken seeds, 32-bit
wyhash32low23104.8528.56239.71 (4)4745 bad seeds
wyhash22640.5328.91229.00 (2)474
w1hash14208.5626.85221.76 (2)
rapidhash22147.0929.07214.80 (2)574
rapidhash_unrolled21723.1329.40220.97 (3)782
umash3221999.1941.14239.07 (2)1530
umash32_hi22347.3341.22251.12 (4)1530
umash6421963.5441.12228.34 (1)1530
umash12813629.6641.65228.25 (1)1530
halftime_hash644801.7999.05310.02 (2)2911
halftime_hash12818220.3494.31307.50 (1)2462
halftime_hash25618249.3297.56322.85 (2)2622
halftime_hash51210906.18118.54326.76 (3)3550
nmhash3212676.0857.09255.20 (2)2445
nmhash32x13072.6442.09288.12 (3)1494
k-hashv328393.0953.95252.98 (3)1280
k-hashv649251.0551.72251.83 (2)1279
komihash12179.7433.23224.80 (2)1323
polymur9913.5341.68232.56 (3)1128
gxhash3247943.0837.71251.38 (2)736AES only
gxhash6448919.7336.61236.98 (3)720AES only

The sortable table variants:

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C, AES and SHA HW variants. See also the old https://github.com/aappleby/smhasher/wiki, the improved, but unmaintained fork https://github.com/demerphq/smhasher, and the new improved version SMHasher3 https://gitlab.com/fwojcik/smhasher3.

So the fastest hash functions on x86_64 without quality problems are:

  • rapidhash (an improved wyhash)
  • xxh3low
  • wyhash
  • umash (even universal!)
  • ahash64
  • t1ha2_atonce
  • komihash
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • halftime_hash128
  • Spooky32
  • pengyhash
  • nmhash32
  • mx3
  • MUM/mir (different results on 32/64-bit archs, lots of bad seeds to filter out)
  • fasthash32

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. Similar for all other dynamic languages. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage and inline-ability beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simplest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity AppendZeroes test against \0 invariance is security relevant.

Columns

MiB/sec: The average of the Bulk key speed test for alignments 0-7 with 262144-byte keys. The higher the better.

cycl./hash: The average of the Small key speed test for 1-31 byte keys. The smaller the better.

cycl./map: The result of the Hashmap test for /usr/dict/words with fast C++ hashmap get queries, with the standard deviation in brackets. This tests the inlinability of the hash function in practise (see size). The smaller the better.

size: The object size in byte on AMD64. This affects the inlinability in e.g. hash tables. The smaller the better.

Quality problems: See the failures in the linked doc. The less the better.

Other

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here. We list some known attacks at GH #186.

Such an attack avoidance cannot be the problem of the hash function, but only the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from language (mis-)features, side-channel attacks, collision timings and independly the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of SipHash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. SipHash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as swisstable/folly-F14, Robin Hood or Cuckoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). All hash functions with less than 160 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity Zero test.

CRYPTO

Our crypto hashes are hardened with an added size_t seed, mixed into the initial state, and the versions which require zero-padding are hardened by adding the len also, to prevent from collisions with AppendedZeroes for the padding. The libtomcrypt implementations already provide for that, but others might not. Without, such crypto hash functions are unsuitable for normal tasks, as it's trivial to create collisions by padding or bad seeds.

The official NIST hash function testsuite does not do such extensive statistical tests, to search for weak ranges in the bits. Also crypto does not change the initial state, which we do here for our random 32bit seed. Crypto mostly cares about unreversable key -> hash functions without changing the initial fixed state and timings/sidechannel attacks.

The NIST "Cryptographic Algorithm Validation Program" (CAVP) involves the testing of the implementations of FIPS-approved and NIST-recommended cryptographic algorithms. During the NIST SHA-3 competition, the testing methodology was borrowed from the "CAVP", as the KATs and MCTs of the SHA-3 Competition Test Suite were based on the CAVP tests for SHA-2. In addition to this, the “Extremely Long Message Test,” not present in the CAVP for SHA-2, required the submitters to generate the hash value corresponding to a message with a length of 1 GiB. “NIST - Cryptographic Algorithm Validation Program (CAVP),” June 2017. Available: http://csrc.nist.gov/groups/STM/cavp (No testing source code provided, just high-level descriptions)

Two other independent third party testsuites found an extensive number of bugs and weaknesses in the SHA3 candidates. "Finding Bugs in Cryptographic Hash Function Implementations", Nicky Mouha, Mohammad S Raunak, D. Richard Kuhn, and Raghu Kacker, 2017. https://eprint.iacr.org/2017/891.pdf

Maybe independent researchers should come together to do a better public SHA-4 round, based on better and more testing methods, open source code for the tests, and using standard industry practices, such as valgrind, address-sanitizer and ubsan to detect obvious bugs.

PROBLEMS

  • Bad Seeds

    Hash functions are typically initialized with a random seed. But some seed values may lead to bad hash functions, regardless of the key. In the regular case with random seeds the probablity of such bad seeds is very low, like 2^32 or 2^64. A practical application needs to know if bad seeds exist and choose another one. See e.g. mirhash_seed_init() and mirhash_bad_seeds() in Hashes.h. Note that a bad seed is not really a problem when you skip this seed during initialization. It can still be a GOOD or recommended hash function. But a bad seed of 0 leading to collisions is considered a bug, a bad hash function.

    We test for internal secrets, if they will be multiplied with 0. This is also called "blinding multiplication". main.cpp lists some secrets for each hash function we test against. The function <hash>_bad_seeds() lists the confirmed bad seeds.

    Special care needs to be taken for crc, most FNV1 variants, fletcher, Jenkins. And with GOOD hashes most MUM variants, like mirhash, MUM, wyhash.

    Independently from this, when the attacker knows the seed it will lead to DDOS attacks. Even with crypto hashes in power2 hashtables.

Typical undefined behaviour (UB) problems:

  • Misaligned

    Many word-wise hashes (in opposite to safe byte-wise processing) don't check the input buffer for proper word alignment, which will fail with ubsan or Sparc. word being int32_t or int64_t or even more. On some old RISC hardware this will be a BUS error, you can even let Intel HW generate such a bus error by setting some CPU flag. But generally using misaligned accesses is fine.

    These are: mx3, Spooky, mirhash (but not strict), MUM, fasthash, Murmur3*, Murmur2*, metrohash* (all but cmetro*), Crap8, beamsplitter, lookup3, fletcher4, fletcher2, all sanmayce FNV1a_ variants (FNV1a_YT, FNV1A_Pippip_Yurii, FNV1A_Totenschiff, ...), fibonacci.

    The usual mitigation is to check the buffer alignment either in the caller, provide a pre-processing loop for the misaligned prefix, or copy the whole buffer into a fresh aligned area. Put that extra code inside #ifdef HAVE_ALIGNED_ACCESS_REQUIRED.

  • oob - Out of bounds

    Some hash function assume a padded input buffer which can be accessed past its length up to the word size. This allows for faster loop processing, as no 2nd loop or switch table for the rest is needed, but it requires a cooperative calling enviroment and is as such considered cheating.

  • Signed integer overflow

    A simple type error, this hash needs to use unsigned integer types internally, to avoid undefined and inconsistent behaviour. i.e. SuperFastHash: signed integer overflow: -2147483641 + -113 cannot be represented in type 'int'

  • shift exponent overflow

    With: FNV1A_Pippip_Yurii, FNV1A_Totenschiff, pair_multiply_shift, sumhash32 shift exponent 64 is too large for 64-bit type 'long unsigned int'