Convert Figma logo to code with AI

nucleuscloud logoneosync

Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.

3,207
112
3,207
110

Top Related Projects

9,797

The Metadata Platform for your Data Stack

15,871

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

1,813

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

16,099

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

36,684

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Quick Overview

Neosync is an open-source data synchronization tool designed to streamline the process of creating and managing synthetic data for development and testing environments. It aims to provide a secure and efficient way to generate realistic test data while maintaining data privacy and compliance.

Pros

  • Enhances data privacy and security by generating synthetic data instead of using production data
  • Supports multiple data sources and destinations, including popular databases and file formats
  • Offers a user-friendly web interface for easy configuration and management
  • Provides customizable data generation rules and transformers

Cons

  • Still in early development stages, which may lead to potential instability or missing features
  • Limited documentation and community support compared to more established tools
  • May require additional setup and configuration for complex data scenarios
  • Performance and scalability for large datasets are yet to be thoroughly tested

Getting Started

To get started with Neosync, follow these steps:

  1. Clone the repository:

    git clone https://github.com/nucleuscloud/neosync.git
    cd neosync
    
  2. Install dependencies:

    make install
    
  3. Set up the development environment:

    make dev
    
  4. Access the web interface at http://localhost:3000 and follow the on-screen instructions to configure your data synchronization jobs.

For more detailed instructions and configuration options, refer to the project's documentation in the repository.

Competitor Comparisons

9,797

The Metadata Platform for your Data Stack

Pros of DataHub

  • More comprehensive data catalog and metadata management system
  • Larger community and ecosystem with extensive documentation
  • Supports a wider range of data sources and integrations

Cons of DataHub

  • More complex setup and configuration process
  • Heavier resource requirements for deployment and operation
  • Steeper learning curve for new users

Code Comparison

DataHub (Python):

from datahub.ingestion.run.pipeline import Pipeline

pipeline = Pipeline.create({
    "source": {"type": "mysql", "config": {...}},
    "sink": {"type": "datahub-rest", "config": {...}}
})
pipeline.run()

Neosync (Go):

client := neosync.NewClient(neosync.ClientConfig{...})
job := &neosync.Job{
    Source: neosync.Source{Type: "mysql", Config: {...}},
    Destination: neosync.Destination{Type: "postgres", Config: {...}}
}
err := client.RunJob(job)

DataHub offers a more extensive data catalog solution with broader integration capabilities, while Neosync focuses on data synchronization and transformation. DataHub's larger community and ecosystem provide more resources and support, but it comes with increased complexity and resource requirements. Neosync, being more specialized, may offer a simpler setup for specific data synchronization tasks.

15,871

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Pros of Airbyte

  • Extensive library of pre-built connectors for various data sources and destinations
  • Strong community support and regular updates
  • Comprehensive documentation and user guides

Cons of Airbyte

  • More complex setup and configuration process
  • Higher resource requirements for deployment and operation
  • Steeper learning curve for new users

Code Comparison

Neosync (Go):

func (j *Job) Run(ctx context.Context) error {
    // Job execution logic
    return nil
}

Airbyte (Java):

public class Job implements Runnable {
    @Override
    public void run() {
        // Job execution logic
    }
}

Both projects use different programming languages, with Neosync primarily using Go and Airbyte using Java. The code snippets show a basic job execution structure, but Airbyte's implementation is typically more complex due to its broader scope and features.

Neosync focuses on data synchronization and anonymization, while Airbyte is a more comprehensive data integration platform. Airbyte offers a wider range of connectors and data transformation capabilities, but this comes at the cost of increased complexity and resource requirements. Neosync, being more specialized, may offer a simpler solution for specific use cases related to data synchronization and privacy.

1,813

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Pros of Meltano

  • More mature project with a larger community and ecosystem
  • Supports a wider range of data sources and destinations
  • Offers a comprehensive ELT (Extract, Load, Transform) workflow

Cons of Meltano

  • Steeper learning curve due to its extensive features
  • Requires more setup and configuration for simple tasks
  • Heavier resource usage for small-scale projects

Code Comparison

Meltano configuration (meltano.yml):

plugins:
  extractors:
    - name: tap-github
      variant: meltanolabs
  loaders:
    - name: target-postgres
      variant: transferwise

Neosync configuration (neosync.yaml):

sources:
  - name: github
    type: github
destinations:
  - name: postgres
    type: postgres

Summary

Meltano is a more comprehensive data integration tool with a broader range of features and integrations, making it suitable for complex data pipelines and large-scale projects. However, this comes at the cost of increased complexity and resource requirements.

Neosync, on the other hand, appears to be a simpler, more focused tool that may be easier to set up and use for smaller projects or specific use cases. It likely has a smaller learning curve but may lack some of the advanced features and integrations offered by Meltano.

The choice between the two would depend on the specific requirements of the project, the scale of data operations, and the level of complexity needed in the data integration process.

16,099

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Pros of Prefect

  • More mature and widely adopted project with a larger community
  • Extensive documentation and tutorials available
  • Supports a wider range of data workflows and integrations

Cons of Prefect

  • Steeper learning curve for beginners
  • Can be overkill for simpler data synchronization tasks
  • Requires more setup and configuration

Code Comparison

Neosync (JavaScript):

const job = new Job({
  name: 'sync-data',
  source: 'postgres',
  destination: 'mysql',
  transformations: [maskPII, anonymizeNames]
});

Prefect (Python):

@task
def extract():
    # Extract data from source
    
@task
def transform(data):
    # Apply transformations
    
@task
def load(data):
    # Load data to destination

@flow
def etl_flow():
    data = extract()
    transformed = transform(data)
    load(transformed)

Neosync focuses on straightforward data synchronization with built-in anonymization features, while Prefect offers a more comprehensive workflow orchestration platform. Neosync may be easier to set up for simple sync tasks, but Prefect provides greater flexibility and scalability for complex data pipelines.

36,684

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Pros of Airflow

  • Mature and widely adopted workflow orchestration platform with a large community
  • Extensive library of pre-built operators and integrations
  • Flexible scheduling and dependency management capabilities

Cons of Airflow

  • Steeper learning curve and more complex setup compared to Neosync
  • Heavier resource requirements, especially for large-scale deployments
  • Less focused on data synchronization and generation tasks

Code Comparison

Airflow DAG definition:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

dag = DAG('example_dag', start_date=datetime(2023, 1, 1))

task = PythonOperator(
    task_id='example_task',
    python_callable=my_function,
    dag=dag
)

Neosync job configuration:

version: 1
jobs:
  - name: example-job
    source:
      postgres:
        connection: source_db
    destination:
      postgres:
        connection: destination_db
    mappings:
      - source_schema: public
        source_table: users
        destination_schema: public
        destination_table: users

While Airflow provides a more general-purpose workflow orchestration platform, Neosync focuses specifically on data synchronization and generation tasks. Airflow offers greater flexibility and extensibility, but Neosync provides a more streamlined experience for its targeted use case.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Open Source Data Anonymization and Synthetic Data Orchestration


Introduction

Neosync is an open-source, developer-first way to anonymize PII, generate synthetic data and sync environments for better testing, debugging and developer experience.

Companies use Neosync to:

  1. Safely test code against production data - Anonymize sensitive production data in order to safely use it locally for a better testing and developer experience
  2. Easily reproduce production bugs locally - Anonymize and subset production data to get a safe, representative data set that you can use to locally reproduce production bugs quickly and efficiently
  3. High quality data for lower-level environments - Catch bugs before they hit production when you hydrate your staging and QA environments with production-like data
  4. Solve GDPR, DPDP, FERPA, HIPAA and more - Use anonymized and synthetic data to reduce your compliance scope and easily comply with laws like HIPAA, GDPR, and DPDP
  5. Seed development databases - Easily seed development databases with synthetic data for unit testing, demos and more

Features

  • Generate synthetic data based on your schema
  • Anonymize existing production-data for a better developer experience
  • Subset your production database for local and CI testing using any SQL query
  • Complete async pipeline that automatically handles job retries, failures and playback using an event-sourcing model
  • Referential integrity for your data automatically
  • Declarative, GitOps based configs as a step in your CI pipeline to hydrate your CI DB
  • Pre-built data transformers for all major data types
  • Custom data transformers using javascript or LLMs
  • Pre-built integrations with Postgres, Mysql, S3

Getting started

Neosync is a fully dockerized setup which makes it easy to get up and running.

A compose.yml file at the root contains production image refs that allow you to get up and running with just a few commands without having to build anything on your system.

Neosync uses the newer docker compose command, so be sure to have that installed on your machine.

To start Neosync, clone the repo into a local directory, be sure to have docker installed and running, and then run:

make compose/up

To stop, run:

make compose/down

Neosync will now be available on http://localhost:3000.

The production compose pre-seeds with connections and jobs to get you started! Simply run the generate and sync job to watch Neosync in action!

Kubernetes, Auth Mode and more

For more in-depth details on environment variables, Kubernetes deployments, and a production-ready guide, check out the Deploy Neosync section of our Docs.

Resources

Some resources to help you along the way:

  • Docs for comprehensive documentation and guides
  • Discord for discussion with the community and Neosync team
  • X for the latest updates

Contributing

We love contributions big and small. Here are just a few ways that you can contribute to Neosync.

Licensing

We strongly believe in free and open source software and make this repo is available under the MIT expat license.