Convert Figma logo to code with AI

uhop logostream-json

The micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory streaming individual primitives using a SAX-inspired API.

1,035
47
1,035
8

Top Related Projects

rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)

Quick Overview

Stream-JSON is a powerful Node.js library for parsing and processing large JSON files and streams. It provides a streaming API that allows for efficient handling of JSON data, even when dealing with files that are too large to fit into memory.

Pros

  • Efficient memory usage through streaming, allowing processing of very large JSON files
  • High performance due to its streaming nature and optimized parsing
  • Flexible and composable architecture with various plugins for different use cases
  • Supports both parsing and stringifying JSON data

Cons

  • Steeper learning curve compared to simple JSON.parse() for basic use cases
  • Requires more setup and code for simple JSON processing tasks
  • Limited browser support (primarily designed for Node.js environments)
  • May be overkill for small JSON files or simple parsing needs

Code Examples

Parsing a large JSON file:

const { parser } = require('stream-json');
const fs = require('fs');

const pipeline = fs.createReadStream('large-file.json').pipe(parser());

pipeline.on('data', data => {
  console.log(data);
});

pipeline.on('end', () => {
  console.log('Parsing completed');
});

Filtering specific keys from a JSON stream:

const { chain } = require('stream-chain');
const { parser } = require('stream-json');
const { pick } = require('stream-json/filters/Pick');
const { streamValues } = require('stream-json/streamers/StreamValues');

const pipeline = chain([
  fs.createReadStream('data.json'),
  parser(),
  pick({filter: 'users'}),
  streamValues(),
  data => data.value
]);

pipeline.on('data', user => {
  console.log(user);
});

Stringifying a large object:

const { Stringifier } = require('stream-json/Stringifier');

const stringifier = new Stringifier();

stringifier.on('data', chunk => {
  console.log(chunk.toString());
});

const largeObject = { /* ... */ };
stringifier.write(largeObject);
stringifier.end();

Getting Started

To use Stream-JSON in your project, first install it via npm:

npm install stream-json

Then, import and use it in your Node.js application:

const { parser } = require('stream-json');
const fs = require('fs');

const pipeline = fs.createReadStream('data.json').pipe(parser());

pipeline.on('data', data => {
  // Process each parsed token
  console.log(data);
});

pipeline.on('end', () => {
  console.log('Parsing completed');
});

This basic setup allows you to start parsing JSON streams efficiently. Explore the library's documentation for more advanced usage and available plugins.

Competitor Comparisons

rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)

Pros of JSONStream

  • Simpler API with fewer options, making it easier to get started
  • Lightweight with minimal dependencies
  • Well-established project with a longer history and wider adoption

Cons of JSONStream

  • Less flexible parsing options compared to stream-json
  • Limited support for custom transformations and advanced use cases
  • Slower performance for large JSON files or complex parsing scenarios

Code Comparison

JSONStream:

const JSONStream = require('JSONStream');
const fs = require('fs');

fs.createReadStream('data.json')
  .pipe(JSONStream.parse('*.name'))
  .on('data', console.log);

stream-json:

const {parser} = require('stream-json');
const {streamValues} = require('stream-json/streamers/StreamValues');
const fs = require('fs');

fs.createReadStream('data.json')
  .pipe(parser())
  .pipe(streamValues())
  .on('data', ({value}) => console.log(value.name));

Key Differences

  • JSONStream uses a simpler, more concise syntax for basic parsing tasks
  • stream-json offers more granular control over parsing and streaming
  • stream-json provides better performance for large-scale data processing
  • JSONStream is more suitable for quick, straightforward JSON parsing needs
  • stream-json excels in scenarios requiring advanced customization and optimization

Both libraries have their strengths, and the choice between them depends on the specific requirements of your project, such as performance needs, parsing complexity, and desired level of control over the streaming process.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

stream-json NPM version

stream-json is a micro-library of node.js stream components with minimal dependencies for creating custom data processors oriented on processing huge JSON files while requiring a minimal memory footprint. It can parse JSON files far exceeding available memory. Even individual primitive data items (keys, strings, and numbers) can be streamed piece-wise. Streaming SAX-inspired event-based API is included as well.

Available components:

  • Streaming JSON Parser.
    • It produces a SAX-like token stream.
    • Optionally it can pack keys, strings, and numbers (controlled separately).
    • The main module provides helpers to create a parser.
  • Filters to edit a token stream:
    • Pick selects desired objects.
      • It can produces multiple top-level objects just like in JSON Streaming protocol.
      • Don't forget to use StreamValues when picking several subobjects!
    • Replace substitutes objects with a replacement.
    • Ignore removes objects.
    • Filter filters tokens maintaining stream's validity.
  • Streamers to produce a stream of JavaScript objects.
    • StreamValues can handle a stream of JSON objects.
      • Useful to stream objects selected by Pick, or generated by other means.
      • It supports JSON Streaming protocol, where individual values are separated semantically (like in "{}[]"), or with white spaces (like in "true 1 null").
    • StreamArray takes an array of objects and produces a stream of its components.
      • It streams array components individually taking care of assembling them automatically.
      • Created initially to deal with JSON files similar to Django-produced database dumps.
      • Only one top-level array per stream is valid!
    • StreamObject takes an object and produces a stream of its top-level properties.
      • Only one top-level object per stream is valid!
  • Essentials:
    • Assembler interprets a token stream creating JavaScript objects.
    • Disassembler produces a token stream from JavaScript objects.
    • Stringer converts a token stream back into a JSON text stream.
    • Emitter reads a token stream and emits each token as an event.
      • It can greatly simplify data processing.
  • Utilities:
    • emit() makes any stream component to emit tokens as events.
    • withParser() helps to create stream components with a parser.
    • Batch batches items into arrays to simplify their processing.
    • Verifier reads a stream and verifies that it is a valid JSON.
    • Utf8Stream sanitizes multibyte utf8 text input.
  • Special helpers:
    • JSONL AKA JSON Lines AKA NDJSON:
      • jsonl/Parser parses a JSONL file producing objects similar to StreamValues.
        • Useful when we know that individual items can fit in memory.
        • Generally it is faster than the equivalent combination of Parser({jsonStreaming: true}) + StreamValues.
      • jsonl/Stringer produces a JSONL file from a stream of JavaScript objects.
        • Generally it is faster than the equivalent combination of Disassembler + Stringer.

All components are meant to be building blocks to create flexible custom data processing pipelines. They can be extended and/or combined with custom code. They can be used together with stream-chain to simplify data processing.

This toolkit is distributed under New BSD license.

Introduction

const {chain}  = require('stream-chain');

const {parser} = require('stream-json');
const {pick}   = require('stream-json/filters/Pick');
const {ignore} = require('stream-json/filters/Ignore');
const {streamValues} = require('stream-json/streamers/StreamValues');

const fs   = require('fs');
const zlib = require('zlib');

const pipeline = chain([
  fs.createReadStream('sample.json.gz'),
  zlib.createGunzip(),
  parser(),
  pick({filter: 'data'}),
  ignore({filter: /\b_meta\b/i}),
  streamValues(),
  data => {
    const value = data.value;
    // keep data only for the accounting department
    return value && value.department === 'accounting' ? data : null;
  }
]);

let counter = 0;
pipeline.on('data', () => ++counter);
pipeline.on('end', () =>
  console.log(`The accounting department has ${counter} employees.`));

See the full documentation in Wiki.

Companion projects:

  • stream-csv-as-json streams huge CSV files in a format compatible with stream-json: rows as arrays of string values. If a header row is used, it can stream rows as objects with named fields.

Installation

npm install --save stream-json
# or: yarn add stream-json

Use

The whole library is organized as a set of small components, which can be combined to produce the most effective pipeline. All components are based on node.js streams, and events. They implement all required standard APIs. It is easy to add your own components to solve your unique tasks.

The code of all components is compact and simple. Please take a look at their source code to see how things are implemented, so you can produce your own components in no time.

Obviously, if a bug is found, or a way to simplify existing components, or new generic components are created, which can be reused in a variety of projects, don't hesitate to open a ticket, and/or create a pull request.

Release History

  • 1.9.0 fixed a slight deviation from the JSON standard. Thx Peter Burns.
  • 1.8.0 added an option to indicate/ignore JSONL errors. Thx, AK.
  • 1.7.5 fixed a stringer bug with ASCII control symbols. Thx, Kraicheck.
  • 1.7.4 updated dependency (stream-chain), bugfix: inconsistent object/array braces. Thx Xiao Li.
  • 1.7.3 added an assembler option to treat numbers as strings.
  • 1.7.2 added an error check for JSONL parsing. Thx Marc-Andre Boily.
  • 1.7.1 minor bugfix and improved error reporting.
  • 1.7.0 added utils/Utf8Stream to sanitize utf8 input, all parsers support it automatically. Thx john30 for the suggestion.
  • 1.6.1 the technical release, no need to upgrade.
  • 1.6.0 added jsonl/Parser and jsonl/Stringer.

The rest can be consulted in the project's wiki Release history.

NPM DownloadsLast 30 Days