csvq

SQL-like query language for csv

1,577

View on GitHub

Top Related Projects

clickhouse-java

1,548

ClickHouse Java Clients & JDBC Driver

spark

41,366

Apache Spark - A unified analytics engine for large-scale data processing

drill

1,985

Apache Drill is a distributed MPP query layer for self describing data

Quick Overview

csvq is a SQL-like query language for CSV, TSV, and other delimited data files. It allows users to perform various data manipulation and analysis tasks on tabular data files using a familiar SQL-like syntax.

Pros

SQL-like Syntax: csvq provides a SQL-like query language, making it easy for users familiar with SQL to work with tabular data files.
Supports Multiple File Formats: csvq can handle a variety of delimited data formats, including CSV, TSV, and others.
Powerful Data Manipulation: csvq offers a wide range of data manipulation capabilities, including filtering, sorting, aggregation, and more.
Cross-platform Compatibility: csvq is a cross-platform tool, available for Windows, macOS, and Linux.

Cons

Limited to Tabular Data: csvq is designed specifically for working with tabular data files, and may not be suitable for more complex data structures.
Learning Curve: While the SQL-like syntax is familiar, users may still need to learn the specific syntax and capabilities of csvq.
Performance Limitations: For very large data sets, csvq's performance may not be as efficient as specialized data analysis tools.
Lack of Advanced Visualization: csvq is focused on data manipulation and analysis, and does not provide advanced data visualization capabilities.

Code Examples

Here are a few examples of how to use csvq:

Selecting Columns:

SELECT name, age, email FROM data.csv

This query selects the name, age, and email columns from the data.csv file.

Filtering Data:

SELECT * FROM data.csv WHERE age > 30 AND city = 'New York'

This query selects all rows from data.csv where the age is greater than 30 and the city is 'New York'.

Aggregating Data:

SELECT city, AVG(age) AS avg_age FROM data.csv GROUP BY city

This query calculates the average age for each unique city in the data.csv file.

Joining Data:

SELECT users.name, orders.order_date, orders.total
FROM users.csv
JOIN orders.csv ON users.id = orders.user_id

This query joins the users.csv and orders.csv files based on the id and user_id columns, and selects the name, order_date, and total columns.

Getting Started

To get started with csvq, follow these steps:

Download the appropriate binary for your operating system from the GitHub releases page.
Extract the downloaded archive and add the csvq executable to your system's PATH.
Create a CSV file (e.g., data.csv) with some sample data:

name,age,email
John Doe,35,john.doe@example.com
Jane Smith,28,jane.smith@example.com
Bob Johnson,42,bob.johnson@example.com

Run a simple query using the csvq command:

csvq "SELECT name, age FROM data.csv"

This will output the name and age columns from the data.csv file.

Explore the csvq documentation to learn more about the available features and capabilities.

Competitor Comparisons

clickhouse-java

1,548

ClickHouse Java Clients & JDBC Driver

Pros of ClickHouse/clickhouse-java

Supports a wide range of data types and features provided by ClickHouse, a high-performance analytical database.
Provides a simple and intuitive API for interacting with ClickHouse, making it easy to integrate into Java-based applications.
Offers advanced features like connection pooling, query execution, and result set handling.

Cons of ClickHouse/clickhouse-java

Primarily focused on ClickHouse, limiting its usefulness for working with other database systems.
May have a steeper learning curve compared to more generic CSV-handling libraries like csvq.
Requires additional setup and configuration to work with ClickHouse, which may not be necessary for simpler CSV-based use cases.

Code Comparison

ClickHouse/clickhouse-java:

try (Connection connection = DriverManager.getConnection("jdbc:clickhouse://localhost:8123/default")) {
    Statement statement = connection.createStatement();
    ResultSet resultSet = statement.executeQuery("SELECT * FROM table_name LIMIT 10");
    while (resultSet.next()) {
        // Process the result set
    }
}

mithrandie/csvq:

query := "SELECT * FROM file.csv LIMIT 10"
rows, err := csvq.Query(query)
if err != nil {
    // Handle the error
}
for rows.Next() {
    // Process the rows
}

hive

5,749

Apache Hive

Pros of Apache Hive

Hive is a powerful data warehousing solution that can handle large-scale data processing and analysis.
It provides a SQL-like interface (HiveQL) that allows users to query and manipulate data stored in a Hadoop cluster.
Hive integrates well with other Apache projects, such as Spark and Impala, making it a versatile tool in the big data ecosystem.

Cons of Apache Hive

Hive can be more complex to set up and configure compared to CSVQ.
The performance of Hive queries can be slower than other big data tools, especially for small datasets.
Hive's SQL-like syntax may not be as familiar to users who are more comfortable with traditional SQL databases.

Code Comparison

CSVQ:

SELECT name, age
FROM people
WHERE age > 30
ORDER BY age DESC
LIMIT 10;

Apache Hive:

SELECT name, age
FROM people
WHERE age > 30
ORDER BY age DESC
LIMIT 10;

The syntax for both CSVQ and Apache Hive is very similar, as they both use a SQL-like language for querying data. The main difference is that CSVQ is designed for working with CSV files, while Apache Hive is a data warehousing solution for Hadoop-based data.

spark

41,366

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Apache Spark

Distributed processing: Spark is designed for distributed computing, allowing it to handle large-scale data processing tasks efficiently.
Diverse ecosystem: Spark has a rich ecosystem with a wide range of libraries and tools, such as Spark SQL, Spark Streaming, and MLlib, which expand its capabilities.
Performance: Spark's in-memory processing and optimized execution engine make it a high-performance data processing framework.

Cons of Apache Spark

Complexity: Spark has a steeper learning curve compared to mithrandie/csvq, as it requires understanding distributed computing concepts and its various components.
Resource requirements: Spark typically requires more computational resources, such as memory and CPU, to achieve its high performance, which can be a limitation for some use cases.

Code Comparison

Apache Spark (Scala):

val spark = SparkSession.builder()
  .appName("WordCount")
  .getOrCreate()

val textFile = spark.read.textFile("README.md")
val counts = textFile.flatMap(line => line.split(" "))
  .groupBy("value")
  .count()

counts.show()

mithrandie/csvq (Go):

q := csvq.New()
q.Query("SELECT * FROM data.csv")

rows, err := q.Exec()
if err != nil {
    panic(err)
}

for _, row := range rows {
    fmt.Println(row)
}

drill

1,985

Apache Drill is a distributed MPP query layer for self describing data

Pros of Apache Drill

Apache Drill is a distributed SQL query engine that can query a variety of data sources, including CSV files, making it a more versatile tool compared to mithrandie/csvq.
Drill supports a wide range of data formats and data sources, including NoSQL databases, cloud storage, and more, allowing for more flexible data analysis.
Drill's distributed architecture allows for scalable and high-performance data processing, making it suitable for large-scale data analysis tasks.

Cons of Apache Drill

Apache Drill has a steeper learning curve compared to mithrandie/csvq, as it requires more setup and configuration to get started.
Drill's complexity and feature-richness may be overkill for users who only need to work with CSV files, making mithrandie/csvq a simpler and more lightweight alternative.
Drill's performance may be affected by the complexity of the data sources and queries, whereas mithrandie/csvq is optimized for CSV file processing.

Code Comparison

mithrandie/csvq:

func (c *CSV) Query(query string) (*Result, error) {
    parser := NewParser(query)
    stmt, err := parser.Parse()
    if err != nil {
        return nil, err
    }

    return c.ExecuteStatement(stmt)
}

Apache Drill:

public class DrillClient {
    private DrillConfig config;
    private DrillClient(DrillConfig config) {
        this.config = config;
    }

    public static DrillClient create(DrillConfig config) {
        return new DrillClient(config);
    }

    public QuerySubmitter querySubmitter() {
        return new QuerySubmitter(this.config);
    }
}

kylin

3,718

Apache Kylin

Pros of Apache Kylin

Kylin is a distributed OLAP engine that can handle large-scale data processing and analysis.
It provides a SQL-based interface for querying data, making it accessible to a wide range of users.
Kylin supports a variety of data sources, including Hadoop, Hive, and Spark, making it a versatile tool.

Cons of Apache Kylin

Kylin has a steeper learning curve compared to CSVQ, as it requires more setup and configuration.
The deployment and maintenance of Kylin can be more complex, especially in large-scale environments.
Kylin may have higher resource requirements (CPU, memory, storage) compared to CSVQ, depending on the scale of the data being processed.

Code Comparison

CSVQ:

func main() {
    csvq.Run()
}

Apache Kylin:

public class KylinApplication {
    public static void main(String[] args) {
        SpringApplication.run(KylinApplication.class, args);
    }
}

impala

1,229

Apache Impala

Pros of Impala

Impala is a high-performance, massively parallel processing (MPP) SQL query engine for data stored in a computer cluster.
Impala supports a wide range of data formats, including Parquet, Avro, and ORC, making it a versatile tool for working with large datasets.
Impala is designed to be highly scalable, allowing it to handle large amounts of data and high query loads.

Cons of Impala

Impala is a complex and resource-intensive system, requiring a significant investment in hardware and infrastructure to set up and maintain.
Impala's learning curve can be steep, as it requires a deep understanding of distributed systems and SQL optimization.

Code Comparison

CSVQ:

func main() {
    csvq.Run(os.Args[1:])
}

Impala:

public class Main {
    public static void main(String[] args) {
        // Impala initialization and query execution code
    }
}

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

csvq

SQL-like query language for csv

Csvq is a command line tool to operate CSV files. You can read, update, delete CSV records with SQL-like query.

You can also execute multiple operations sequentially in managed transactions by passing a procedure or using the interactive shell. In the multiple operations, you can use variables, cursors, temporary tables, and other features.

Latest Release

Intended Use

Csvq is intended for one-time queries and routine processing described in source files on the amount of data that can be handled by spreadsheet applications.

It is not suitable for handling very large data since all data is kept on memory when queries are executed. There is no indexing, calculation order optimization, etc., and the execution speed is not fast due to the inclusion of mechanisms for updating data and handling various other features.

However, it can be run with a single executable binary, and you don't have to worry about troublesome dependencies during installation. You can not only write and run your own queries, but also share source files with co-workers on multiple platforms.

This tool may be useful for those who want to handle data easily and roughly, without having to think about troublesome matters.

Features

CSV File Operation
- Select Query
- Insert Query
- Update Query
- Replace Query
- Delete Query
- Create Table Query
- Alter Table Query
Cursor
Temporary Table
Transaction Management
Support loading data from Standard Input
Support following file formats
- CSV
- TSV
- LTSV
- Fixed-Length Format
- JSON
- JSON Lines
Support following file encodings
- UTF-8
- UTF-16
- Shift_JIS
JSON and JSON Lines formats support only UTF-8.

Reference Manual

Reference Manual - csvq

Installation

Install executable binary

Download an archive file from release page.
Extract the downloaded archive and add a binary file in it to your path.

Build from source

Requirements

Go 1.18 or later (cf. Getting Started - The Go Programming Language)

Build command

$ go install github.com/mithrandie/csvq

Install using package manager

Installing using a package manager does not ensure that you always get the latest version, but it may make installation and updating easier.

Ubuntu

$ sudo add-apt-repository ppa:mithrandie/csvq
$ sudo apt update
$ sudo apt install csvq

Arch Linux (unofficial)

Install the csvq-git or csvq-bin from the Arch User Repository (e.g. yay -S csvq-git)

macOS (unofficial)

Install homebrew (cf. The missing package manager for macOS (or Linux) â Homebrew)
$ brew install csvq

Usage

# Simple query
csvq 'select id, name from `user.csv`'
csvq 'select id, name from user'

# Specify data delimiter as tab character
csvq -d '\t' 'select count(*) from `user.csv`'

# Load no-header-csv
csvq --no-header 'select c1, c2 from user'

# Load from redirection or pipe
csvq 'select * from stdin' < user.csv
cat user.csv | csvq 'select *'

# Load from Fixed-Length Format
cat /var/log/syslog | csvq -n -i fixed -m '[15, 24, 124]' 'select *'

# Split lines with spaces automatically
ps | csvq -i fixed -m spaces 'select * from stdin'

# Output in JSON format
csvq -f json 'select integer(id) as id, name from user'

# Output to a file
csvq -o new_user.csv 'select id, name from user'

# Load statements from file
$ cat statements.sql
VAR @id := 0;
SELECT @id := @id + 1 AS id,
       name
  FROM user;

$ csvq -s statements.sql

# Execute statements in the interactive shell
$ csvq
csvq > UPDATE users SET name = 'Mildred' WHERE id = 2;
1 record updated on "/home/mithrandie/docs/csv/users.csv".
csvq > COMMIT;
Commit: file "/home/mithrandie/docs/csv/users.csv" is updated.
csvq > EXIT;

# Show help
csvq -h

More details >> https://mithrandie.github.io/csvq

Execute csvq statements in Go

csvq-driver

Example of cooperation with other applications

csvq emacs extension

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot