data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

1,380

Top Related Projects

pandas

46,172

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

arrow

15,787

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

dask

13,376

Parallel computing with task scheduling

modin

10,249

Modin: Scale your Pandas workflows by changing a single line of code

vaex

8,418

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

polars

34,705

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Quick Overview

Data-Forge is a powerful data transformation and analysis toolkit for JavaScript and TypeScript. It provides a fluent API for working with datasets, offering functionality similar to pandas in Python or dplyr in R. Data-Forge is designed to handle both tabular and time series data efficiently.

Pros

Comprehensive API for data manipulation and analysis
Supports both JavaScript and TypeScript
Extensible through plugins
Well-documented with extensive examples

Cons

Learning curve for users new to data manipulation libraries
Performance may be slower compared to native JavaScript operations for large datasets
Limited built-in visualization capabilities
Smaller community compared to more established data libraries

Code Examples

Loading and filtering data:

import { readFile } from 'data-forge-fs';

const df = await readFile('data.csv')
    .parseCSV()
    .where(row => row.age > 30)
    .select(row => ({
        name: row.name,
        age: row.age
    }));

console.log(df.head(5).toArray());

Performing calculations on columns:

import { DataFrame } from 'data-forge';

const df = new DataFrame({
    columns: {
        A: [1, 2, 3, 4, 5],
        B: [10, 20, 30, 40, 50]
    }
});

const result = df
    .generateSeries({
        C: row => row.A * 2,
        D: row => row.B / 10
    })
    .toArray();

console.log(result);

Grouping and aggregating data:

import { readFile } from 'data-forge-fs';

const df = await readFile('sales.csv')
    .parseCSV()
    .groupBy(row => row.category)
    .select(group => ({
        category: group.first().category,
        totalSales: group.deflate(row => row.sales).sum(),
        averagePrice: group.deflate(row => row.price).average()
    }))
    .toArray();

console.log(df);

Getting Started

To get started with Data-Forge, follow these steps:

Install Data-Forge:
```
npm install data-forge
```

Import and use Data-Forge in your TypeScript or JavaScript project:

import { DataFrame } from 'data-forge';

const df = new DataFrame({
    columns: {
        Name: ['John', 'Jane', 'Bob'],
        Age: [25, 30, 35]
    }
});

console.log(df.toString());

Explore the documentation and examples on the official Data-Forge website to learn more about its capabilities and advanced features.

Competitor Comparisons

pandas

46,172

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Pros of pandas

Extensive functionality and mature ecosystem
Highly optimized for performance with C extensions
Large community and extensive documentation

Cons of pandas

Steeper learning curve for beginners
Memory-intensive for large datasets
Python-specific, not easily portable to other languages

Code Comparison

pandas:

import pandas as pd

df = pd.read_csv('data.csv')
filtered = df[df['column'] > 5]
result = filtered.groupby('category').mean()

Data-Forge:

import { readFile } from 'data-forge-fs';

const df = await readFile('data.csv').parseCSV();
const filtered = df.where(row => row.column > 5);
const result = filtered.groupBy(row => row.category).select(group => group.mean());

Both libraries offer similar functionality for data manipulation, but pandas has a more concise syntax due to its specialized data structures. Data-Forge follows a more functional programming approach and is designed for TypeScript, making it more suitable for JavaScript/TypeScript developers working with data in web applications or Node.js environments.

arrow

15,787

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

Pros of Arrow

Highly performant and memory-efficient columnar memory format
Supports multiple programming languages and platforms
Extensive ecosystem and community support

Cons of Arrow

Steeper learning curve due to its complexity
May be overkill for smaller-scale data processing tasks
Requires more setup and configuration

Code Comparison

Arrow (C++):

#include <arrow/api.h>

std::shared_ptr<arrow::Table> table;
arrow::MemoryPool* pool = arrow::default_memory_pool();
arrow::Int64Builder builder(pool);
builder.AppendValues({1, 2, 3, 4, 5});

Data-Forge-TS (TypeScript):

import { DataFrame } from 'data-forge';

const df = new DataFrame({
    columnNames: ["Value"],
    rows: [[1], [2], [3], [4], [5]]
});

Key Differences

Arrow focuses on efficient data representation and interoperability
Data-Forge-TS is more oriented towards data manipulation and analysis in TypeScript
Arrow has a broader scope and supports multiple languages, while Data-Forge-TS is TypeScript-specific
Arrow is better suited for large-scale data processing, while Data-Forge-TS is more appropriate for smaller datasets and simpler operations

dask

13,376

Parallel computing with task scheduling

Pros of Dask

Designed for large-scale parallel computing and distributed processing
Integrates well with the Python scientific ecosystem (NumPy, Pandas)
Supports complex workflows and task scheduling

Cons of Dask

Steeper learning curve, especially for distributed computing concepts
More complex setup and configuration for distributed environments
Potentially overkill for smaller datasets or simpler data processing tasks

Code Comparison

Data-Forge-TS:

import { DataFrame } from 'data-forge';

const df = new DataFrame([
    { A: 1, B: 10 },
    { A: 2, B: 20 },
]);
const result = df.select(row => row.A * 2);

Dask:

import dask.dataframe as dd

df = dd.from_pandas(pd.DataFrame({
    'A': [1, 2],
    'B': [10, 20]
}), npartitions=2)
result = df['A'] * 2

Key Differences

Data-Forge-TS is TypeScript-based, while Dask is Python-based
Dask focuses on distributed computing and big data, while Data-Forge-TS is more suited for in-memory data processing
Dask has a wider range of data structures and integrations with scientific libraries

modin

10,249

Modin: Scale your Pandas workflows by changing a single line of code

Pros of Modin

Designed for large-scale data processing with distributed computing capabilities
Seamless integration with existing pandas code, requiring minimal changes
Significantly faster performance for large datasets compared to pandas

Cons of Modin

Limited functionality compared to pandas, not all operations are supported
Requires additional setup and dependencies for distributed computing
May have overhead for small datasets, potentially slower than pandas

Code Comparison

Modin:

import modin.pandas as pd

df = pd.read_csv("large_dataset.csv")
result = df.groupby("category").mean()

Data-Forge-TS:

import { readFile } from "data-forge-fs";

const df = await readFile("large_dataset.csv").parseCSV();
const result = df.groupBy(row => row.category).select(group => group.mean());

Key Differences

Modin focuses on scaling pandas operations for big data, while Data-Forge-TS is a TypeScript data manipulation library
Modin aims for pandas compatibility, whereas Data-Forge-TS has its own API design
Modin is better suited for large-scale data processing, while Data-Forge-TS is more appropriate for smaller datasets and TypeScript projects

vaex

8,418

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Pros of Vaex

Designed for handling large datasets (up to 1 billion rows) efficiently
Supports out-of-core computing, allowing processing of data larger than RAM
Offers advanced visualization capabilities for big data

Cons of Vaex

Primarily focused on tabular data, less versatile for other data structures
Steeper learning curve due to its specialized nature
Less integration with TypeScript ecosystem

Code Comparison

Data-Forge-TS:

import { DataFrame } from 'data-forge';

const df = new DataFrame({
    columnNames: ["A", "B", "C"],
    rows: [[1, 2, 3], [4, 5, 6]]
});

Vaex:

import vaex

df = vaex.from_arrays(
    A=[1, 4], B=[2, 5], C=[3, 6]
)

Summary

Vaex excels in handling extremely large datasets and provides powerful visualization tools, making it ideal for big data analysis. However, it may be overkill for smaller projects and has a steeper learning curve. Data-Forge-TS, on the other hand, offers a more general-purpose data manipulation library with better TypeScript integration, but may not be as efficient for very large datasets. The choice between the two depends on the specific requirements of your project, particularly in terms of data size and processing needs.

polars

34,705

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Pros of Polars

Written in Rust, offering high performance and memory efficiency
Supports both eager and lazy execution modes
Provides a wide range of data manipulation and analysis functions

Cons of Polars

Steeper learning curve due to its Rust foundations
Less integrated with the TypeScript/JavaScript ecosystem
May require additional setup for use in web-based environments

Code Comparison

Data-Forge-TS:

import { DataFrame } from 'data-forge';

const df = new DataFrame([
  { A: 1, B: 'x' },
  { A: 2, B: 'y' },
]);
const filtered = df.where(row => row.A > 1);

Polars:

import polars as pl

df = pl.DataFrame({'A': [1, 2], 'B': ['x', 'y']})
filtered = df.filter(pl.col('A') > 1)

Summary

Polars is a high-performance data manipulation library written in Rust, offering excellent speed and efficiency. It provides a rich set of features for data analysis and supports both eager and lazy execution. However, it may have a steeper learning curve and less seamless integration with TypeScript/JavaScript projects compared to Data-Forge-TS.

Data-Forge-TS, on the other hand, is specifically designed for TypeScript and JavaScript environments, making it more accessible for web developers. While it may not match Polars in raw performance, it offers a familiar API and easier integration with existing TypeScript/JavaScript codebases.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Data-Forge

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Implemented in TypeScript.
Used in JavaScript ES5+ or TypeScript.

Love this? Please star this repo and click here to support my work

Please note that this TypeScript repository replaces the previous JavaScript version of Data-Forge.

BREAKING CHANGES

As of v1.6.9 the dependencies Sugar, Lodash and Moment have been factored out (or replaced with smaller dependencies). This more than halves the bundle size. Hopefully this won't cause any problems - but please log an issue if something changes that you weren't expecting.

As of v1.3.0 file system support has been removed from the Data-Forge core API. This is after repeated issues from users trying to get Data-Forge working in the browser, especially under AngularJS 6.

Functions for reading and writing files have been moved to the separate code library Data-Forge FS.

If you are using the file read and write functions prior to 1.3.0 then your code will no longer work when you upgrade to 1.3.0. The fix is simple though, where usually you would just require in Data-Forge as follows:

const dataForge = require('data-forge');

Now you must also require in the new library as well:

const dataForge = require('data-forge');
require('data-forge-fs');

Data-Forge FS augments Data-Forge core so that you can use the readFile/writeFile functions as in previous versions and as is shown in this readme and the guide.

If you still have problems with AngularJS 6 please see this workaround: https://github.com/data-forge/data-forge-ts/issues/3#issuecomment-438580174

Install

To install for Node.js and the browser:

npm install --save data-forge

If working in Node.js and you want the functions to read and write data files:

npm install --save data-forge-fs

Quick start

Data-Forge can load CSV, JSON or arbitrary data sets.

Parse the data, filter it, transform it, aggregate it, sort it and much more.

Use the data however you want or export it to CSV or JSON.

Here's an example:

const dataForge = require('data-forge');
require('data-forge-fs'); // For readFile/writeFile.

dataForge.readFileSync('./input-data-file.csv') // Read CSV file (or JSON!)
    .parseCSV()
    .parseDates(["Column B"]) // Parse date columns.
    .parseInts(["Column B", "Column C"]) // Parse integer columns.
    .parseFloats(["Column D", "Column E"]) // Parse float columns.
    .dropSeries(["Column F"]) // Drop certain columns.
    .where(row => predicate(row)) // Filter rows.
    .select(row => transform(row)) // Transform the data.
    .asCSV() 
    .writeFileSync("./output-data-file.csv"); // Write to output CSV file (or JSON!)

From the browser

Data-Forge has been tested with Browserify and Webpack. Please see links to examples below.

If you aren't using Browserify or Webpack, the npm package includes a pre-packed browser distribution that you can install and included in your HTML as follows:

<script language="javascript" type="text/javascript" src="node_modules/data-forge/dist/web/index.js"></script>

This gives you the data-forge package mounted under the global variable dataForge.

Please remember that you can't use data-forge-fs or the file system functions in the browser.

Features

Import and export CSV and JSON data and text files (when using Data-Forge FS).
Or work with arbitrary JavaScript data.
Many options for working with your data:
- Filtering
- Transformation
- Extracting subsets
- Grouping, aggregation and summarization
- Sorting
- And much more
Great for slicing and dicing tabular data:
- Add, remove, transform and generate named columns (series) of data.
Great for working with time series data.
Your data is indexed so you have the ability to merge and aggregate.
Your data is immutable! Transformations and modifications produce a new dataset.
Build data pipeline that are evaluated lazily.
Inspired by Pandas and LINQ, so it might feel familiar!

Contributions

Want a bug fixed or maybe to improve performance?

Don't see your favourite feature?

Need to add your favourite Pandas or LINQ feature?

Please contribute and help improve this library for everyone!

Fork it, make a change, submit a pull request. Want to chat? See my contact details at the end or reach out on Gitter.

Platforms

Node.js (npm install --save data-forge data-forge-fs) (see example here)
Browser
- Via bower (bower install --save data-forge) (see example here)
- Via Browserify (see example here)
- Via Webpack (see example here)

Documentation

Resources

Data Wrangling with JavaScript

Contact

Please reach and tell me what you are doing with Data-Forge or how you'd like to see it improved.

Twitter: @codecapers
Email: ashley@codecapers.com.au
Linkedin: www.linkedin.com/in/ashleydavis75
Web: www.codecapers.com.au

Support the developer

Click here to support the developer.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of pandas

Cons of pandas

Code Comparison

Pros of Arrow

Cons of Arrow

Code Comparison

Key Differences

Pros of Dask

Cons of Dask

Code Comparison

Key Differences

Pros of Modin

Cons of Modin

Code Comparison

Key Differences

Pros of Vaex

Cons of Vaex

Code Comparison

Summary

Pros of Polars

Cons of Polars

Code Comparison

Summary

Convert designs to code with AI

README

Data-Forge

BREAKING CHANGES

Install

Quick start

From the browser

Features

Contributions

Platforms

Documentation

Resources

Contact

Support the developer

Top Related Projects

Convert designs to code with AI

NPM DownloadsLast 30 Days