Top Related Projects
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Strongly typed JSON library for Rust
One of the fastest alternative JSON parser for Go that does not require schema
Rust parser combinator framework
Quick Overview
SIMD-JSON is a high-performance JSON parser and serializer for Rust that leverages Single Instruction, Multiple Data (SIMD) instructions for faster processing. It aims to provide a significant speed boost over traditional JSON parsing methods while maintaining compatibility with standard JSON formats.
Pros
- Extremely fast JSON parsing and serialization due to SIMD optimization
- Compatible with standard JSON formats
- Supports both validation and parsing in a single pass
- Provides fallback implementations for systems without SIMD support
Cons
- Requires CPU support for specific SIMD instruction sets (e.g., AVX2, SSE4.2)
- May have a steeper learning curve compared to standard JSON libraries
- Limited to Rust programming language
- Potential for increased code complexity when dealing with SIMD optimizations
Code Examples
- Parsing JSON:
use simd_json::ValueAccess;
let data = r#"{"name": "John Doe", "age": 30, "city": "New York"}"#;
let mut json: simd_json::BorrowedValue = simd_json::to_borrowed_value(data.as_bytes()).unwrap();
println!("Name: {}", json["name"].as_str().unwrap());
println!("Age: {}", json["age"].as_u64().unwrap());
- Serializing to JSON:
use simd_json::prelude::*;
let mut obj = simd_json::OwnedValue::object();
obj.insert("name", "Jane Smith");
obj.insert("age", 28);
obj.insert("city", "London");
let json_string = simd_json::to_string(&obj).unwrap();
println!("JSON: {}", json_string);
- Validating JSON:
use simd_json::prelude::*;
let data = r#"{"valid": true, "count": 42}"#;
let is_valid = simd_json::validate(data.as_bytes());
println!("Is valid JSON: {}", is_valid);
Getting Started
To use SIMD-JSON in your Rust project, add the following to your Cargo.toml
:
[dependencies]
simd-json = "0.7"
Then, in your Rust code:
use simd_json::prelude::*;
fn main() {
let data = r#"{"key": "value"}"#;
let json: simd_json::BorrowedValue = simd_json::to_borrowed_value(data.as_bytes()).unwrap();
println!("Parsed JSON: {:?}", json);
}
Make sure your CPU supports the required SIMD instructions. SIMD-JSON will automatically use the most efficient implementation available on your system.
Competitor Comparisons
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Pros of simdjson
- More comprehensive and feature-rich JSON parsing library
- Supports a wider range of platforms and architectures
- Extensive documentation and active community support
Cons of simdjson
- Larger codebase and potentially higher memory footprint
- May have slightly higher compilation times due to its complexity
Code Comparison
simdjson:
auto json = R"( [1, 2, 3, 4] )"_padded;
ondemand::parser parser;
auto doc = parser.iterate(json);
for (auto value : doc) {
std::cout << value << std::endl;
}
simd-json:
let mut json = r#"[1, 2, 3, 4]"#.to_string();
let value: Value = simd_json::from_str(&mut json)?;
for item in value.as_array().unwrap() {
println!("{}", item);
}
Key Differences
- simdjson is primarily written in C++, while simd-json is written in Rust
- simdjson offers both DOM and streaming parsing, whereas simd-json focuses on DOM parsing
- simd-json is more tightly integrated with Rust's ecosystem and standard library
Performance Considerations
- Both libraries leverage SIMD instructions for high-performance parsing
- simdjson may have an edge in raw parsing speed, but simd-json integrates more seamlessly with Rust's memory model
- Actual performance can vary depending on the specific use case and data being parsed
Strongly typed JSON library for Rust
Pros of serde_json
- Widely adopted and well-established in the Rust ecosystem
- Seamless integration with Serde for serialization and deserialization
- Extensive documentation and community support
Cons of serde_json
- Generally slower performance compared to SIMD-accelerated parsing
- Higher memory usage during parsing and serialization operations
Code Comparison
serde_json:
use serde_json::Value;
let json_str = r#"{"name": "John", "age": 30}"#;
let parsed: Value = serde_json::from_str(json_str)?;
println!("Name: {}", parsed["name"]);
simd-json:
use simd_json::ValueAccess;
let mut json_str = r#"{"name": "John", "age": 30}"#.to_string();
let parsed = simd_json::to_borrowed_value(&mut json_str)?;
println!("Name: {}", parsed["name"]);
The main difference in usage is that simd-json requires mutable input for parsing, while serde_json can work with immutable strings. simd-json also provides a different API for accessing parsed values, but both libraries offer similar functionality for JSON manipulation.
serde_json is more suitable for projects prioritizing ecosystem compatibility and ease of use, while simd-json is ideal for applications requiring high-performance JSON parsing and processing.
One of the fastest alternative JSON parser for Go that does not require schema
Pros of jsonparser
- Pure Go implementation, making it easy to integrate and deploy in Go projects
- No external dependencies, reducing potential compatibility issues
- Supports streaming parsing for large JSON files
Cons of jsonparser
- Generally slower performance compared to simd-json
- Limited to Go language, while simd-json supports multiple languages
- Lacks SIMD optimizations for faster processing
Code Comparison
jsonparser:
import "github.com/buger/jsonparser"
value, err := jsonparser.GetString(data, "user", "name")
simd-json:
use simd_json::ValueAccess;
let value = parsed["user"]["name"].as_str().unwrap();
Both libraries provide ways to access nested JSON values, but simd-json leverages SIMD instructions for faster parsing and processing. jsonparser offers a more straightforward API for Go developers, while simd-json provides high-performance parsing across multiple languages.
simd-json excels in performance-critical applications, especially when dealing with large JSON datasets. However, jsonparser may be more suitable for Go projects that prioritize ease of use and don't require extreme parsing speeds.
Rust parser combinator framework
Pros of nom
- More versatile: nom is a general-purpose parser combinator library, suitable for parsing various data formats beyond JSON
- Highly customizable: allows creating complex parsers by combining smaller, reusable parsing functions
- Extensive documentation and examples, making it easier for newcomers to learn and use
Cons of nom
- Generally slower for JSON parsing compared to simd-json's specialized implementation
- Requires more code to implement a full JSON parser, as it's a toolkit rather than a ready-made solution
- Higher learning curve for those unfamiliar with parser combinators or functional programming concepts
Code Comparison
nom (parsing a simple key-value pair):
use nom::{
bytes::complete::tag,
character::complete::alphanumeric1,
sequence::separated_pair,
IResult,
};
fn parse_key_value(input: &str) -> IResult<&str, (&str, &str)> {
separated_pair(alphanumeric1, tag(":"), alphanumeric1)(input)
}
simd-json (parsing JSON):
use simd_json::ValueAccess;
let mut json = r#"{"key": "value"}"#.to_string();
let value = simd_json::to_borrowed_value(&mut json)?;
let key_value = value.get("key").unwrap().as_str().unwrap();
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
SIMD JSON for Rust
Rust port of extremely fast simdjson JSON parser with Serde compatibility.
simd-json is a Rust port of the simdjson c++ library. It follows most of the design closely with a few exceptions to make it better fit into the Rust ecosystem.
Goals
The goal of the Rust port of simdjson is not to create a one-to-one copy, but to integrate the principles of the C++ library into a Rust library that plays well with the Rust ecosystem. As such we provide both compatibility with Serde as well as parsing to a DOM to manipulate data.
Performance
As a rule of thumb this library tries to get as close as possible to the performance of the C++ implementation (currently tracking 0.2.x, work in progress). However, in some design decisionsâsuch as parsing to a DOM or a tapeâergonomics is prioritized over performance. In other places Rust makes it harder to achieve the same level of performance.
To take advantage of this library your system needs to support SIMD instructions. On x86
, it will
select the best available supported instruction set (avx2
or sse4.2
) when the runtime-detection
feature
is enabled (default). On aarch64
this library uses the NEON
instruction set. On wasm
this library uses
the simd128
instruction set when available. When no supported SIMD instructions are found, this library will use a
fallback implementation, but this is significantly slower.
Allocator
For best performance, we highly suggest using mimalloc or jemalloc instead of the system allocator used by default. Another recent allocator that works well (but we have yet to test it in production) is snmalloc.
Safety
simd-json
uses a lot of unsafe code.
There are a few reasons for this:
- SIMD intrinsics are inherently unsafe. These uses of unsafe are inescapable in a library such as
simd-json
. - We work around some performance bottlenecks imposed by safe rust. These are avoidable, but at a performance cost.
This is a more considered path in
simd-json
.
simd-json
goes through extra scrutiny for unsafe code. These steps are:
- Unit tests - to test 'the obvious' cases, edge cases, and regression cases
- Structural constructive property based testing - We generate random valid JSON objects to exercise the full
simd-json
codebase stochastically. Floats are currently excluded since slightly different parsing algorithms lead to slightly different results here. In short "is simd-json correct". - Data-oriented property-based testing of string-like data - to assert that sequences of legal printable characters don't panic or crash the parser (they might and often error so - they are not valid JSON!)
- Destructive Property based testing - make sure that no illegal byte sequences crash the parser in any way
- Fuzzing - fuzz based on upstream & jsonorg simd pass/fail cases
This doesn't ensure complete safety nor is at a bulletproof guarantee, but it does go a long way to assert that the library is of high production quality and fit for purpose for practical industrial applications.
Features
Various features can be enabled or disabled to tweak various parts of this library. Any features not mentioned here are for internal configuration and testing.
runtime-detection
(default)
This feature allows selecting the optimal algorithm based on available features during runtime. It has no effect on
non-x86
platforms. When neither AVX2
nor SSE4.2
is supported, it will fall back to a native Rust implementation.
Disabling this feature (with default-features = false
) and setting RUSTFLAGS="-C target-cpu=native
will result
in better performance but the resulting binary will not be portable across x86
processors.
serde_impl
(default)
Enable Serde support. This consist of implementing serde::Serializer
and serde::Deserializer
,
allowing types that implement serde::Serialize
/serde::Deserialize
to be constructed/serialized to
BorrowedValue
/OwnedValue
.
In addition, this provides the same convenience functions that serde_json
provides.
Disabling this feature (with default-features = false
) will remove serde
and serde_json
from the dependencies.
swar-number-parsing
(default)
Enables a parsing method that will parse 8 digits at a time for floats. This is a common pattern but comes at a slight performance hit if most of the float have less than 8 digits.
known-key
The known-key
feature changes the hash mechanism for the DOM representation of the underlying JSON object from
ahash
to fxhash
. The ahash
hasher is faster at hashing and provides protection against DOS attacks by forcing
multiple keys into a single hashing bucket. The fxhash
hasher allows for repeatable hashing results,
which in turn allows memoizing hashes for well known keys and saving time on lookups. In workloads that are heavy on
accessing some well-known keys, this can be a performance advantage.
The known-key
feature is optional and disabled by default and should be explicitly configured.
value-no-dup-keys
This flag has no effect on simd-json itself but purely affects the Value
structs.
The value-no-dup-keys
feature flag enables stricter behavior for objects when deserializing into a Value
. When
enabled, the Value deserializer will remove duplicate keys in a JSON object and only keep the last one. If not set
duplicate keys are considered undefined behavior and Value will not make guarantees on its behavior.
big-int-as-float
The big-int-as-float
feature flag treats very large integers that won't fit into u64 as f64 floats. This prevents
parsing errors if the JSON you are parsing contains very large integers. Keep in mind that f64 loses some precision when
representing very large numbers.
128bit
Add support for parsing and serializing 128-bit integers. This feature is disabled by default because such large numbers are rare in the wild and adding the support incurs a performance penalty.
beef
Enabling this feature can break dependencies in your dependency tree that are using simd-json
.
Replace std::borrow::Cow
with
beef::lean::Cow
This feature is disabled by default, because
it is a breaking change in the API.
ordered-float
By default the representation of Floats
used in borrowed::Value
and owned::Value
is simply a value of f64
.
This however has the normally-not-a-big-deal side effect of not having these Value
types be std::cmp::Eq
. This does,
however, introduce some incompatibilities when offering simd-json
as a quasi-drop-in replacement for serde-json
.
So, this feature changes the internal representation of Floats
to be an f64
wrapped by an Eq-compatible adapter.
This probably carries with it some small performance trade-offs, hence its enablement by feature rather than by default.
portable
Currently disabled
An highly experimental implementation of the algorithm using std::simd
and up to 512 byte wide registers.
Usage
simd-json offers three main entry points for usage:
Values API
The values API is a set of optimized DOM objects that allow parsed
JSON to JSON data that has no known variable structure. simd-json
has two versions of this:
Borrowed Values
use simd_json;
let mut d = br#"{"some": ["key", "value", 2]}"#.to_vec();
let v: simd_json::BorrowedValue = simd_json::to_borrowed_value(&mut d).unwrap();
Owned Values
use simd_json;
let mut d = br#"{"some": ["key", "value", 2]}"#.to_vec();
let v: simd_json::OwnedValue = simd_json::to_owned_value(&mut d).unwrap();
Serde Compatible API
use simd_json;
use serde_json::Value;
let mut d = br#"{"some": ["key", "value", 2]}"#.to_vec();
let v: Value = simd_json::serde::from_slice(&mut d).unwrap();
Tape API
use simd_json;
let mut d = br#"{"the_answer": 42}"#.to_vec();
let tape = simd_json::to_tape(&mut d).unwrap();
let value = tape.as_value();
// try_get treats value like an object, returns Ok(Some(_)) because the key is found
assert!(value.try_get("the_answer").unwrap().unwrap() == 42);
// returns Ok(None) because the key is not found but value is an object
assert!(value.try_get("does_not_exist").unwrap() == None);
// try_get_idx treats value like an array, returns Err(_) because value is not an array
assert!(value.try_get_idx(0).is_err());
Other interesting things
There are also bindings for upstream simdjson
available here
License
simd-json itself is licensed under either of
at your option.
However it ports a lot of code from simdjson so their work and copyright on that should also be respected.
The Serde integration is based on serde-json
so their copyright should as well be respected.
All Thanks To Our Contributors:
Top Related Projects
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Strongly typed JSON library for Rust
One of the fastest alternative JSON parser for Go that does not require schema
Rust parser combinator framework
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot