Convert Figma logo to code with AI

data-forge logodata-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

1,345
78
1,345
15

Top Related Projects

43,524

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

14,426

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

12,495

Parallel computing with task scheduling

9,845

Modin: Scale your Pandas workflows by changing a single line of code

8,280

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

29,748

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Quick Overview

Data-Forge is a powerful data transformation and analysis toolkit for JavaScript and TypeScript. It provides a fluent API for working with datasets, offering functionality similar to pandas in Python or dplyr in R. Data-Forge is designed to handle both tabular and time series data efficiently.

Pros

  • Comprehensive API for data manipulation and analysis
  • Supports both JavaScript and TypeScript
  • Extensible through plugins
  • Well-documented with extensive examples

Cons

  • Learning curve for users new to data manipulation libraries
  • Performance may be slower compared to native JavaScript operations for large datasets
  • Limited built-in visualization capabilities
  • Smaller community compared to more established data libraries

Code Examples

Loading and filtering data:

import { readFile } from 'data-forge-fs';

const df = await readFile('data.csv')
    .parseCSV()
    .where(row => row.age > 30)
    .select(row => ({
        name: row.name,
        age: row.age
    }));

console.log(df.head(5).toArray());

Performing calculations on columns:

import { DataFrame } from 'data-forge';

const df = new DataFrame({
    columns: {
        A: [1, 2, 3, 4, 5],
        B: [10, 20, 30, 40, 50]
    }
});

const result = df
    .generateSeries({
        C: row => row.A * 2,
        D: row => row.B / 10
    })
    .toArray();

console.log(result);

Grouping and aggregating data:

import { readFile } from 'data-forge-fs';

const df = await readFile('sales.csv')
    .parseCSV()
    .groupBy(row => row.category)
    .select(group => ({
        category: group.first().category,
        totalSales: group.deflate(row => row.sales).sum(),
        averagePrice: group.deflate(row => row.price).average()
    }))
    .toArray();

console.log(df);

Getting Started

To get started with Data-Forge, follow these steps:

  1. Install Data-Forge:

    npm install data-forge
    
  2. Import and use Data-Forge in your TypeScript or JavaScript project:

    import { DataFrame } from 'data-forge';
    
    const df = new DataFrame({
        columns: {
            Name: ['John', 'Jane', 'Bob'],
            Age: [25, 30, 35]
        }
    });
    
    console.log(df.toString());
    
  3. Explore the documentation and examples on the official Data-Forge website to learn more about its capabilities and advanced features.

Competitor Comparisons

43,524

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Pros of pandas

  • Extensive functionality and mature ecosystem
  • Highly optimized for performance with C extensions
  • Large community and extensive documentation

Cons of pandas

  • Steeper learning curve for beginners
  • Memory-intensive for large datasets
  • Python-specific, not easily portable to other languages

Code Comparison

pandas:

import pandas as pd

df = pd.read_csv('data.csv')
filtered = df[df['column'] > 5]
result = filtered.groupby('category').mean()

Data-Forge:

import { readFile } from 'data-forge-fs';

const df = await readFile('data.csv').parseCSV();
const filtered = df.where(row => row.column > 5);
const result = filtered.groupBy(row => row.category).select(group => group.mean());

Both libraries offer similar functionality for data manipulation, but pandas has a more concise syntax due to its specialized data structures. Data-Forge follows a more functional programming approach and is designed for TypeScript, making it more suitable for JavaScript/TypeScript developers working with data in web applications or Node.js environments.

14,426

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Pros of Arrow

  • Highly performant and memory-efficient columnar memory format
  • Supports multiple programming languages and platforms
  • Extensive ecosystem and community support

Cons of Arrow

  • Steeper learning curve due to its complexity
  • May be overkill for smaller-scale data processing tasks
  • Requires more setup and configuration

Code Comparison

Arrow (C++):

#include <arrow/api.h>

std::shared_ptr<arrow::Table> table;
arrow::MemoryPool* pool = arrow::default_memory_pool();
arrow::Int64Builder builder(pool);
builder.AppendValues({1, 2, 3, 4, 5});

Data-Forge-TS (TypeScript):

import { DataFrame } from 'data-forge';

const df = new DataFrame({
    columnNames: ["Value"],
    rows: [[1], [2], [3], [4], [5]]
});

Key Differences

  • Arrow focuses on efficient data representation and interoperability
  • Data-Forge-TS is more oriented towards data manipulation and analysis in TypeScript
  • Arrow has a broader scope and supports multiple languages, while Data-Forge-TS is TypeScript-specific
  • Arrow is better suited for large-scale data processing, while Data-Forge-TS is more appropriate for smaller datasets and simpler operations
12,495

Parallel computing with task scheduling

Pros of Dask

  • Designed for large-scale parallel computing and distributed processing
  • Integrates well with the Python scientific ecosystem (NumPy, Pandas)
  • Supports complex workflows and task scheduling

Cons of Dask

  • Steeper learning curve, especially for distributed computing concepts
  • More complex setup and configuration for distributed environments
  • Potentially overkill for smaller datasets or simpler data processing tasks

Code Comparison

Data-Forge-TS:

import { DataFrame } from 'data-forge';

const df = new DataFrame([
    { A: 1, B: 10 },
    { A: 2, B: 20 },
]);
const result = df.select(row => row.A * 2);

Dask:

import dask.dataframe as dd

df = dd.from_pandas(pd.DataFrame({
    'A': [1, 2],
    'B': [10, 20]
}), npartitions=2)
result = df['A'] * 2

Key Differences

  • Data-Forge-TS is TypeScript-based, while Dask is Python-based
  • Dask focuses on distributed computing and big data, while Data-Forge-TS is more suited for in-memory data processing
  • Dask has a wider range of data structures and integrations with scientific libraries
9,845

Modin: Scale your Pandas workflows by changing a single line of code

Pros of Modin

  • Designed for large-scale data processing with distributed computing capabilities
  • Seamless integration with existing pandas code, requiring minimal changes
  • Significantly faster performance for large datasets compared to pandas

Cons of Modin

  • Limited functionality compared to pandas, not all operations are supported
  • Requires additional setup and dependencies for distributed computing
  • May have overhead for small datasets, potentially slower than pandas

Code Comparison

Modin:

import modin.pandas as pd

df = pd.read_csv("large_dataset.csv")
result = df.groupby("category").mean()

Data-Forge-TS:

import { readFile } from "data-forge-fs";

const df = await readFile("large_dataset.csv").parseCSV();
const result = df.groupBy(row => row.category).select(group => group.mean());

Key Differences

  • Modin focuses on scaling pandas operations for big data, while Data-Forge-TS is a TypeScript data manipulation library
  • Modin aims for pandas compatibility, whereas Data-Forge-TS has its own API design
  • Modin is better suited for large-scale data processing, while Data-Forge-TS is more appropriate for smaller datasets and TypeScript projects
8,280

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Pros of Vaex

  • Designed for handling large datasets (up to 1 billion rows) efficiently
  • Supports out-of-core computing, allowing processing of data larger than RAM
  • Offers advanced visualization capabilities for big data

Cons of Vaex

  • Primarily focused on tabular data, less versatile for other data structures
  • Steeper learning curve due to its specialized nature
  • Less integration with TypeScript ecosystem

Code Comparison

Data-Forge-TS:

import { DataFrame } from 'data-forge';

const df = new DataFrame({
    columnNames: ["A", "B", "C"],
    rows: [[1, 2, 3], [4, 5, 6]]
});

Vaex:

import vaex

df = vaex.from_arrays(
    A=[1, 4], B=[2, 5], C=[3, 6]
)

Summary

Vaex excels in handling extremely large datasets and provides powerful visualization tools, making it ideal for big data analysis. However, it may be overkill for smaller projects and has a steeper learning curve. Data-Forge-TS, on the other hand, offers a more general-purpose data manipulation library with better TypeScript integration, but may not be as efficient for very large datasets. The choice between the two depends on the specific requirements of your project, particularly in terms of data size and processing needs.

29,748

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Pros of Polars

  • Written in Rust, offering high performance and memory efficiency
  • Supports both eager and lazy execution modes
  • Provides a wide range of data manipulation and analysis functions

Cons of Polars

  • Steeper learning curve due to its Rust foundations
  • Less integrated with the TypeScript/JavaScript ecosystem
  • May require additional setup for use in web-based environments

Code Comparison

Data-Forge-TS:

import { DataFrame } from 'data-forge';

const df = new DataFrame([
  { A: 1, B: 'x' },
  { A: 2, B: 'y' },
]);
const filtered = df.where(row => row.A > 1);

Polars:

import polars as pl

df = pl.DataFrame({'A': [1, 2], 'B': ['x', 'y']})
filtered = df.filter(pl.col('A') > 1)

Summary

Polars is a high-performance data manipulation library written in Rust, offering excellent speed and efficiency. It provides a rich set of features for data analysis and supports both eager and lazy execution. However, it may have a steeper learning curve and less seamless integration with TypeScript/JavaScript projects compared to Data-Forge-TS.

Data-Forge-TS, on the other hand, is specifically designed for TypeScript and JavaScript environments, making it more accessible for web developers. While it may not match Polars in raw performance, it offers a familiar API and easier integration with existing TypeScript/JavaScript codebases.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Data-Forge

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Implemented in TypeScript.
Used in JavaScript ES5+ or TypeScript.

To learn more about Data-Forge visit the home page.

Read about Data-Forge for data science in the book JavaScript for Data Science.

Love this? Please star this repo and click here to support my work

Build Status npm version License

Please note that this TypeScript repository replaces the previous JavaScript version of Data-Forge.

BREAKING CHANGES

As of v1.6.9 the dependencies Sugar, Lodash and Moment have been factored out (or replaced with smaller dependencies). This more than halves the bundle size. Hopefully this won't cause any problems - but please log an issue if something changes that you weren't expecting.

As of v1.3.0 file system support has been removed from the Data-Forge core API. This is after repeated issues from users trying to get Data-Forge working in the browser, especially under AngularJS 6.

Functions for reading and writing files have been moved to the separate code library Data-Forge FS.

If you are using the file read and write functions prior to 1.3.0 then your code will no longer work when you upgrade to 1.3.0. The fix is simple though, where usually you would just require in Data-Forge as follows:

const dataForge = require('data-forge');

Now you must also require in the new library as well:

const dataForge = require('data-forge');
require('data-forge-fs');

Data-Forge FS augments Data-Forge core so that you can use the readFile/writeFile functions as in previous versions and as is shown in this readme and the guide.

If you still have problems with AngularJS 6 please see this workaround: https://github.com/data-forge/data-forge-ts/issues/3#issuecomment-438580174

Install

To install for Node.js and the browser:

npm install --save data-forge

If working in Node.js and you want the functions to read and write data files:

npm install --save data-forge-fs

Quick start

Data-Forge can load CSV, JSON or arbitrary data sets.

Parse the data, filter it, transform it, aggregate it, sort it and much more.

Use the data however you want or export it to CSV or JSON.

Here's an example:

const dataForge = require('data-forge');
require('data-forge-fs'); // For readFile/writeFile.

dataForge.readFileSync('./input-data-file.csv') // Read CSV file (or JSON!)
    .parseCSV()
    .parseDates(["Column B"]) // Parse date columns.
    .parseInts(["Column B", "Column C"]) // Parse integer columns.
    .parseFloats(["Column D", "Column E"]) // Parse float columns.
    .dropSeries(["Column F"]) // Drop certain columns.
    .where(row => predicate(row)) // Filter rows.
    .select(row => transform(row)) // Transform the data.
    .asCSV() 
    .writeFileSync("./output-data-file.csv"); // Write to output CSV file (or JSON!)

From the browser

Data-Forge has been tested with Browserify and Webpack. Please see links to examples below.

If you aren't using Browserify or Webpack, the npm package includes a pre-packed browser distribution that you can install and included in your HTML as follows:

<script language="javascript" type="text/javascript" src="node_modules/data-forge/dist/web/index.js"></script>

This gives you the data-forge package mounted under the global variable dataForge.

Please remember that you can't use data-forge-fs or the file system functions in the browser.

Features

  • Import and export CSV and JSON data and text files (when using Data-Forge FS).
  • Or work with arbitrary JavaScript data.
  • Many options for working with your data:
    • Filtering
    • Transformation
    • Extracting subsets
    • Grouping, aggregation and summarization
    • Sorting
    • And much more
  • Great for slicing and dicing tabular data:
    • Add, remove, transform and generate named columns (series) of data.
  • Great for working with time series data.
  • Your data is indexed so you have the ability to merge and aggregate.
  • Your data is immutable! Transformations and modifications produce a new dataset.
  • Build data pipeline that are evaluated lazily.
  • Inspired by Pandas and LINQ, so it might feel familiar!

Contributions

Want a bug fixed or maybe to improve performance?

Don't see your favourite feature?

Need to add your favourite Pandas or LINQ feature?

Please contribute and help improve this library for everyone!

Fork it, make a change, submit a pull request. Want to chat? See my contact details at the end or reach out on Gitter.

Platforms

Documentation

Resources

Contact

Please reach and tell me what you are doing with Data-Forge or how you'd like to see it improved.

Support the developer

Click here to support the developer.

NPM DownloadsLast 30 Days