Top Related Projects
Quick Overview
CityHash is a family of non-cryptographic hash functions designed by Google for string hashing. It's optimized for performance on modern CPUs and provides better distribution and collision resistance compared to some older hash functions. CityHash is particularly well-suited for hashing short strings.
Pros
- High performance, especially on modern CPUs with good instruction pipelining
- Better distribution and collision resistance compared to some older hash functions
- Multiple variants available for different use cases (32-bit, 64-bit, 128-bit)
- Optimized for short strings, which is common in many applications
Cons
- Not cryptographically secure, should not be used for security-sensitive applications
- May not be the best choice for very long strings or streaming data
- Less widely adopted compared to some other hash functions
- Limited ongoing development and updates
Code Examples
- Basic 64-bit hash:
#include <city.h>
#include <string>
std::string input = "Hello, World!";
uint64_t hash = CityHash64(input.c_str(), input.length());
- 128-bit hash:
#include <city.h>
#include <string>
std::string input = "Hello, World!";
uint128 hash = CityHash128(input.c_str(), input.length());
- 32-bit hash with seed:
#include <city.h>
#include <string>
std::string input = "Hello, World!";
uint32_t seed = 12345;
uint32_t hash = CityHash32WithSeed(input.c_str(), input.length(), seed);
Getting Started
To use CityHash in your C++ project:
-
Clone the repository:
git clone https://github.com/google/cityhash.git
-
Build the library:
cd cityhash ./configure make
-
Include the header in your C++ file:
#include <city.h>
-
Link against the built library when compiling your project:
g++ -I/path/to/cityhash/src -L/path/to/cityhash/src/.libs -lcityhash your_file.cpp -o your_program
Remember to replace /path/to/cityhash
with the actual path where you cloned the repository.
Competitor Comparisons
Hash function quality and speed tests
Pros of SMHasher
- More comprehensive test suite for hash functions
- Supports a wider range of hash algorithms
- Actively maintained with regular updates
Cons of SMHasher
- Larger codebase, potentially more complex to use
- May have higher resource requirements for running tests
Code Comparison
SMHasher:
void BulkSpeedTest(HashInfo* info)
{
const int trials = 100;
const int blocksize = 2048;
uint8_t * block = new uint8_t[blocksize];
memset(block,0,blocksize);
for(int i = 0; i < trials; i++)
{
info->hash(block,blocksize,0);
}
}
CityHash:
uint64 CityHash64(const char *buf, size_t len) {
if (len <= 32) {
if (len <= 16) {
return HashLen0to16(buf, len);
} else {
return HashLen17to32(buf, len);
}
} else if (len <= 64) {
return HashLen33to64(buf, len);
}
// For strings over 64 bytes we hash the end first, and then as we
// loop we keep 56 bytes of state: v, w, x, y, and z.
uint64 x = Fetch64(buf + len - 40);
uint64 y = Fetch64(buf + len - 16) + Fetch64(buf + len - 56);
uint64 z = HashLen16(Fetch64(buf + len - 48) + len, Fetch64(buf + len - 24));
// ... (additional code omitted for brevity)
}
SMHasher provides a more extensive testing framework, while CityHash focuses on implementing specific hash functions efficiently.
Extremely fast non-cryptographic hash algorithm
Pros of xxHash
- Faster performance, especially for small inputs
- Better distribution of hash values
- More actively maintained with regular updates
Cons of xxHash
- Less established in some ecosystems compared to CityHash
- May require more effort to integrate into existing systems using CityHash
Code Comparison
CityHash:
uint64 CityHash64(const char *buf, size_t len) {
if (len <= 32) {
if (len <= 16) {
return HashLen0to16(buf, len);
} else {
return HashLen17to32(buf, len);
}
}
// ... (additional code)
}
xxHash:
XXH64_hash_t XXH64(const void* input, size_t length, XXH64_hash_t seed) {
if (input == NULL) return 0;
if (length < 32) return XXH64_small(input, length, seed);
return XXH64_internal(input, length, seed);
}
Both hash functions use different approaches for small inputs, with xxHash having a simpler interface and potentially more optimized code for various input sizes. CityHash uses multiple specialized functions for different input lengths, while xxHash has a more streamlined approach with a single entry point and internal optimizations.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
CityHash, a family of hash functions for strings.
Introduction
CityHash provides hash functions for strings. The functions mix the input bits thoroughly but are not suitable for cryptography. See "Hash Quality," below, for details on how CityHash was tested and so on.
We provide reference implementations in C++, with a friendly MIT license.
CityHash32() returns a 32-bit hash.
CityHash64() and similar return a 64-bit hash.
CityHash128() and similar return a 128-bit hash and are tuned for strings of at least a few hundred bytes. Depending on your compiler and hardware, it's likely faster than CityHash64() on sufficiently long strings. It's slower than necessary on shorter strings, but we expect that case to be relatively unimportant.
CityHashCrc128() and similar are variants of CityHash128() that depend on _mm_crc32_u64(), an intrinsic that compiles to a CRC32 instruction on some CPUs. However, none of the functions we provide are CRCs.
CityHashCrc256() is a variant of CityHashCrc128() that also depends on _mm_crc32_u64(). It returns a 256-bit hash.
All members of the CityHash family were designed with heavy reliance on previous work by Austin Appleby, Bob Jenkins, and others. For example, CityHash32 has many similarities with Murmur3a.
Performance on long strings: 64-bit CPUs
We are most excited by the performance of CityHash64() and its variants on short strings, but long strings are interesting as well.
CityHash is intended to be fast, under the constraint that it hash very well. For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't designed as a hash function and shouldn't be used as one. CityHashCrc128() is not a CRC, but it uses the CRC32 machinery.
On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about 5 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers around CityHashCrc256 and should have similar performance on long strings. (CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as thorough as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The fastest Murmur variant on that hardware, Murmur3F, peaks at about 2.4 bytes/cycle. We expect the peak speed of CityHash128 to dominate CityHash64, which is aimed more toward short strings or use in hash tables.
For long strings, a new function by Bob Jenkins, SpookyHash, is just slightly slower than CityHash128 on Intel x86-64 CPUs, but noticeably faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs and/or CPUs without the CRC instruction, SpookyHash may be just as good or better than any of the CityHash variants.
Performance on short strings: 64-bit CPUs
For short strings, e.g., most hash table keys, CityHash64 is faster than CityHash128, and probably faster than all the aforementioned functions, depending on the mix of string lengths. Here are a few results from that same hardware, where we (unrealistically) tested a single string length over and over again:
Hash Results
CityHash64 v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes Murmur2 (64-bit) 6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes Murmur3F 14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes
We don't have CityHash64 benchmarks results for v1.1, but we expect the numbers to be similar.
Performance: 32-bit CPUs
CityHash32 is the newest variant of CityHash. It is intended for 32-bit hardware in general but has been mostly tested on x86. Our benchmarks suggest that Murmur3 is the nearest competitor to CityHash32 on x86. We don't know of anything faster that has comparable quality. The speed rankings in our testing: CityHash32 > Murmur3f > Murmur3a (for long strings), and CityHash32 > Murmur3a > Murmur3f (for short strings).
Installation
We provide reference implementations of several CityHash functions, written in C++. The build system is based on autoconf. It defaults the C++ compiler flags to "-g -O2", which is probably slower than -O3 if you are using gcc. YMMV.
On systems with gcc, we generally recommend:
./configure make all check CXXFLAGS="-g -O3" sudo make install
Or, if your system has the CRC32 instruction, and you want to build everything:
./configure --enable-sse4.2 make all check CXXFLAGS="-g -O3 -msse4.2" sudo make install
Note that our build system doesn't try to determine the appropriate compiler flag for enabling SSE4.2. For gcc it is "-msse4.2". The --enable-sse4.2 flag to the configure script controls whether citycrc.h is installed when you "make install." In general, picking the right compiler flags can be tricky, and may depend on your compiler, your hardware, and even how you plan to use the library.
For generic information about how to configure this software, please try:
./configure --help
Failing that, please work from city.cc and city*.h, as they contain all the necessary code.
Usage
The above installation instructions will produce a single library. It will contain CityHash32(), CityHash64(), and CityHash128(), and their variants, and possibly CityHashCrc128(), CityHashCrc128WithSeed(), and CityHashCrc256(). The functions with Crc in the name are declared in citycrc.h; the rest are declared in city.h.
Limitations
- CityHash32 is intended for little-endian 32-bit code, and everything else in the current version of CityHash is intended for little-endian 64-bit CPUs.
All functions that don't use the CRC32 instruction should work in little-endian 32-bit or 64-bit code. CityHash should work on big-endian CPUs as well, but we haven't tested that very thoroughly yet.
- CityHash is fairly complex. As a result of its complexity, it may not perform as expected on some compilers. For example, preliminary reports suggest that some Microsoft compilers compile CityHash to assembly that's 10-20% slower than it could be.
Hash Quality
We like to test hash functions with SMHasher, among other things. SMHasher isn't perfect, but it seems to find almost any significant flaw. SMHasher is available at http://code.google.com/p/smhasher/
SMHasher is designed to pass a 32-bit seed to the hash functions it tests. No CityHash function is designed to work that way, so we adapt as follows: For our functions that accept a seed, we use the given seed directly (padded with zeroes); for our functions that don't accept a seed, we hash the concatenation of the given seed and the input string.
The CityHash functions have the following flaws according to SMHasher:
(1) CityHash64: none
(2) CityHash64WithSeed: none
(3) CityHash64WithSeeds: did not test
(4) CityHash128: none
(5) CityHash128WithSeed: none
(6) CityHashCrc128: none
(7) CityHashCrc128WithSeed: none
(8) CityHashCrc256: none
(9) CityHash32: none
Some minor flaws in 32-bit and 64-bit functions are harmless, as we expect the primary use of these functions will be in hash tables. We may have gone slightly overboard in trying to please SMHasher and other similar tests, but we don't want anyone to choose a different hash function because of some minor issue reported by a quality test.
For more information
http://code.google.com/p/cityhash/
cityhash-discuss@googlegroups.com
Please feel free to send us comments, questions, bug reports, or patches.
Top Related Projects
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot