encoding

Go package containing implementations of efficient encoding, decoding, and validation APIs.

1,013

View on GitHub View on NPM

Top Related Projects

snappy

1,558

The Snappy compression format in the Go programming language.

simdjson-go

1,923

Golang port of simdjson: parsing gigabytes of JSON per second

fastjson

2,387

Fast JSON parser and validator for Go. No custom structs, no code generation, no reflection

Quick Overview

The segmentio/encoding repository is a Go package that provides efficient encoding and decoding of various data formats, including JSON, CSV, and NDJSON. It aims to offer high-performance alternatives to the standard library implementations, with a focus on speed and memory efficiency.

Pros

High performance: Significantly faster than standard library implementations
Memory efficient: Reduces allocations and optimizes memory usage
Flexible: Supports multiple data formats (JSON, CSV, NDJSON)
API compatibility: Largely compatible with standard library interfaces

Cons

Limited format support: Focuses on a few specific formats
Potential compatibility issues: May not support all edge cases of standard library implementations
Learning curve: Requires understanding of the package's specific APIs and optimizations
Maintenance concerns: Less frequently updated compared to standard library

Code Examples

JSON Encoding:

import "github.com/segmentio/encoding/json"

type Person struct {
    Name string `json:"name"`
    Age  int    `json:"age"`
}

p := Person{Name: "John Doe", Age: 30}
data, err := json.Marshal(p)
if err != nil {
    // Handle error
}
fmt.Println(string(data))

CSV Decoding:

import "github.com/segmentio/encoding/csv"

csvData := `name,age
John Doe,30
Jane Smith,25`

reader := csv.NewReader(strings.NewReader(csvData))
records, err := reader.ReadAll()
if err != nil {
    // Handle error
}
fmt.Println(records)

NDJSON Encoding:

import "github.com/segmentio/encoding/json"

type LogEntry struct {
    Timestamp time.Time `json:"timestamp"`
    Message   string    `json:"message"`
}

entries := []LogEntry{
    {Timestamp: time.Now(), Message: "Log entry 1"},
    {Timestamp: time.Now(), Message: "Log entry 2"},
}

for _, entry := range entries {
    data, _ := json.Marshal(entry)
    fmt.Println(string(data))
}

Getting Started

To use the segmentio/encoding package in your Go project:

Install the package:
```
go get github.com/segmentio/encoding
```

Import the desired sub-package in your code:

import "github.com/segmentio/encoding/json"
// or
import "github.com/segmentio/encoding/csv"

Use the package functions as shown in the code examples above.

Competitor Comparisons

compress

5,180

Optimized Go Compression Packages

Pros of compress

Offers a wider range of compression algorithms (Zstandard, Snappy, LZ4, etc.)
Generally provides better compression ratios and faster performance
Actively maintained with frequent updates and optimizations

Cons of compress

More complex API, potentially steeper learning curve
Larger codebase and dependency footprint
May be overkill for simple encoding/decoding tasks

Code comparison

encoding:

encoded := encoding.Base64Encode([]byte("Hello, World!"))
decoded, err := encoding.Base64Decode(encoded)

compress:

compressed := snappy.Encode(nil, []byte("Hello, World!"))
decompressed, err := snappy.Decode(nil, compressed)

Summary

encoding focuses on simple encoding/decoding operations, while compress offers a broader range of compression algorithms with better performance. encoding is easier to use for basic tasks, but compress provides more advanced features and optimizations. The choice between the two depends on the specific requirements of your project, such as compression ratio, speed, and complexity tolerance.

snappy

1,558

The Snappy compression format in the Go programming language.

Pros of snappy

Official implementation by the Go team, ensuring high-quality and well-maintained code
Optimized for speed, providing fast compression and decompression
Widely used and battle-tested in production environments

Cons of snappy

Limited compression ratio compared to other algorithms
Lacks advanced features like streaming or dictionary compression

Code comparison

snappy:

compressed := snappy.Encode(nil, data)
decompressed, err := snappy.Decode(nil, compressed)

encoding:

compressed := encoding.Compress(data)
decompressed, err := encoding.Decompress(compressed)

Key differences

snappy focuses solely on the Snappy compression algorithm, while encoding provides multiple encoding and compression options
encoding offers a more diverse set of tools for data manipulation, including Base64, Hex, and various compression algorithms
snappy is part of the official Go ecosystem, potentially benefiting from better integration with other Go tools and libraries

Use cases

Choose snappy for fast, lightweight compression in Go projects, especially when working with large datasets that require quick processing
Opt for encoding when you need a versatile toolkit for various encoding and compression tasks in a single package

Community and support

snappy benefits from the backing of the Go team and a larger community
encoding is maintained by Segment and has a smaller but active community

simdjson-go

1,923

Golang port of simdjson: parsing gigabytes of JSON per second

Pros of simdjson-go

Utilizes SIMD instructions for faster JSON parsing
Designed for high-performance, low-latency applications
Supports streaming parsing for large JSON files

Cons of simdjson-go

More complex implementation compared to encoding
May have a steeper learning curve for developers
Limited to JSON parsing, while encoding supports multiple encoding formats

Code Comparison

simdjson-go:

pj, err := simdjson.ParseND([]byte(json), nil)
if err != nil {
    log.Fatal(err)
}
value := pj.Iter()

encoding:

var data map[string]interface{}
err := json.Unmarshal([]byte(jsonString), &data)
if err != nil {
    log.Fatal(err)
}

Summary

simdjson-go is a high-performance JSON parsing library that leverages SIMD instructions for speed, making it ideal for applications requiring fast JSON processing. It offers streaming capabilities for large files but is more complex to use compared to encoding.

encoding, on the other hand, is a more general-purpose encoding library that supports multiple formats beyond JSON. It's simpler to use and integrate but may not offer the same level of performance optimization for JSON parsing as simdjson-go.

The choice between the two depends on specific project requirements, with simdjson-go being better suited for high-performance JSON-centric applications, while encoding offers more versatility for various encoding needs.

fastjson

2,387

Fast JSON parser and validator for Go. No custom structs, no code generation, no reflection

Pros of fastjson

Significantly faster JSON encoding and decoding performance
Lower memory allocation and garbage collection overhead
Supports streaming JSON parsing for large datasets

Cons of fastjson

Less feature-rich compared to encoding, focusing primarily on speed
May require more manual handling for complex JSON structures
Less idiomatic Go code, potentially harder to read and maintain

Code Comparison

fastjson:

var p fastjson.Parser
v, err := p.Parse(jsonStr)
if err != nil {
    return err
}
name := v.GetStringBytes("name")

encoding:

var data map[string]interface{}
err := json.Unmarshal([]byte(jsonStr), &data)
if err != nil {
    return err
}
name := data["name"].(string)

Summary

fastjson excels in performance-critical scenarios where raw speed is crucial, while encoding offers a more feature-rich and idiomatic Go experience. fastjson is ideal for high-throughput applications dealing with large volumes of JSON data, whereas encoding provides a more familiar and flexible API for general-purpose JSON handling in Go projects. The choice between the two depends on the specific requirements of your application, balancing performance needs against ease of use and maintainability.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

encoding

Go package containing implementations of encoders and decoders for various data formats.

Motivation

At Segment, we do a lot of marshaling and unmarshaling of data when sending, queuing, or storing messages. The resources we need to provision on the infrastructure are directly related to the type and amount of data that we are processing. At the scale we operate at, the tools we choose to build programs can have a large impact on the efficiency of our systems. It is important to explore alternative approaches when we reach the limits of the code we use.

This repository includes experiments for Go packages for marshaling and unmarshaling data in various formats. While the focus is on providing a high performance library, we also aim for very low development and maintenance overhead by implementing APIs that can be used as drop-in replacements for the default solutions.

Requirements and Maintenance Schedule

This package has no dependencies outside of the core runtime of Go. It requires a recent version of Go.

This package follows the same maintenance schedule as the Go project, meaning that issues relating to versions of Go which aren't supported by the Go team, or versions of this package which are older than 1 year, are unlikely to be considered.

Additionally, we have fuzz tests which aren't a runtime required dependency but will be pulled in when running go mod tidy. Please don't include these go.mod updates in change requests.

encoding/json

More details about how this package achieves a lower CPU and memory footprint can be found in the package README.

The json sub-package provides a re-implementation of the functionalities offered by the standard library's encoding/json package, with a focus on lowering the CPU and memory footprint of the code.

The exported API of this package mirrors the standard library's encoding/json package, the only change needed to take advantage of the performance improvements is the import path of the json package, from:

import (
    "encoding/json"
)

import (
    "github.com/segmentio/encoding/json"
)

The improvement can be significant for code that heavily relies on serializing and deserializing JSON payloads. The CI pipeline runs benchmarks to compare the performance of the package with the standard library and other popular alternatives; here's an overview of the results:

Comparing to encoding/json (v1.16.2)

name                           old time/op    new time/op     delta
Marshal/*json.codeResponse2      6.40ms Â± 2%     3.82ms Â± 1%   -40.29%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    28.1ms Â± 3%      5.6ms Â± 3%   -80.21%  (p=0.008 n=5+5)

name                           old speed      new speed       delta
Marshal/*json.codeResponse2     303MB/s Â± 2%    507MB/s Â± 1%   +67.47%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2  69.2MB/s Â± 3%  349.6MB/s Â± 3%  +405.42%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op    delta
Marshal/*json.codeResponse2       0.00B           0.00B           ~     (all equal)
Unmarshal/*json.codeResponse2    1.80MB Â± 1%     0.02MB Â± 0%   -99.14%  (p=0.016 n=5+4)

name                           old allocs/op  new allocs/op   delta
Marshal/*json.codeResponse2        0.00            0.00           ~     (all equal)
Unmarshal/*json.codeResponse2     76.6k Â± 0%       0.1k Â± 3%   -99.92%  (p=0.008 n=5+5)

Benchmarks were run on a Core i9-8950HK CPU @ 2.90GHz.

Comparing to github.com/json-iterator/go (v1.1.10)

name                           old time/op    new time/op    delta
Marshal/*json.codeResponse2      6.19ms Â± 3%    3.82ms Â± 1%   -38.26%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    8.52ms Â± 3%    5.55ms Â± 3%   -34.84%  (p=0.008 n=5+5)

name                           old speed      new speed      delta
Marshal/*json.codeResponse2     313MB/s Â± 3%   507MB/s Â± 1%   +61.91%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2   228MB/s Â± 3%   350MB/s Â± 3%   +53.50%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
Marshal/*json.codeResponse2       8.00B Â± 0%     0.00B       -100.00%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2    1.05MB Â± 0%    0.02MB Â± 0%   -98.53%  (p=0.000 n=5+4)

name                           old allocs/op  new allocs/op  delta
Marshal/*json.codeResponse2        1.00 Â± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Unmarshal/*json.codeResponse2     37.2k Â± 0%      0.1k Â± 3%   -99.83%  (p=0.008 n=5+5)

Although this package aims to be a drop-in replacement of encoding/json, it does not guarantee the same error messages. It will error in the same cases as the standard library, but the exact error message may be different.

encoding/iso8601

The iso8601 sub-package exposes APIs to efficiently deal with with string representations of iso8601 dates.

Data formats like JSON have no syntaxes to represent dates, they are usually serialized and represented as a string value. In our experience, we often have to check whether a string value looks like a date, and either construct a time.Time by parsing it or simply treat it as a string. This check can be done by attempting to parse the value, and if it fails fallback to using the raw string. Unfortunately, while the happy path for time.Parse is fairly efficient, constructing errors is much slower and has a much bigger memory footprint.

We've developed fast iso8601 validation functions that cause no heap allocations to remediate this problem. We added a validation step to determine whether the value is a date representation or a simple string. This reduced CPU and memory usage by 5% in some programs that were doing time.Parse calls on very hot code paths.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot