Top Related Projects
Quick Overview
csvq is a SQL-like query language for CSV, TSV, and other delimited data files. It allows users to perform various data manipulation and analysis tasks on tabular data files using a familiar SQL-like syntax.
Pros
- SQL-like Syntax: csvq provides a SQL-like query language, making it easy for users familiar with SQL to work with tabular data files.
- Supports Multiple File Formats: csvq can handle a variety of delimited data formats, including CSV, TSV, and others.
- Powerful Data Manipulation: csvq offers a wide range of data manipulation capabilities, including filtering, sorting, aggregation, and more.
- Cross-platform Compatibility: csvq is a cross-platform tool, available for Windows, macOS, and Linux.
Cons
- Limited to Tabular Data: csvq is designed specifically for working with tabular data files, and may not be suitable for more complex data structures.
- Learning Curve: While the SQL-like syntax is familiar, users may still need to learn the specific syntax and capabilities of csvq.
- Performance Limitations: For very large data sets, csvq's performance may not be as efficient as specialized data analysis tools.
- Lack of Advanced Visualization: csvq is focused on data manipulation and analysis, and does not provide advanced data visualization capabilities.
Code Examples
Here are a few examples of how to use csvq:
- Selecting Columns:
SELECT name, age, email FROM data.csv
This query selects the name
, age
, and email
columns from the data.csv
file.
- Filtering Data:
SELECT * FROM data.csv WHERE age > 30 AND city = 'New York'
This query selects all rows from data.csv
where the age
is greater than 30 and the city
is 'New York'.
- Aggregating Data:
SELECT city, AVG(age) AS avg_age FROM data.csv GROUP BY city
This query calculates the average age for each unique city in the data.csv
file.
- Joining Data:
SELECT users.name, orders.order_date, orders.total
FROM users.csv
JOIN orders.csv ON users.id = orders.user_id
This query joins the users.csv
and orders.csv
files based on the id
and user_id
columns, and selects the name
, order_date
, and total
columns.
Getting Started
To get started with csvq, follow these steps:
- Download the appropriate binary for your operating system from the GitHub releases page.
- Extract the downloaded archive and add the
csvq
executable to your system's PATH. - Create a CSV file (e.g.,
data.csv
) with some sample data:
name,age,email
John Doe,35,john.doe@example.com
Jane Smith,28,jane.smith@example.com
Bob Johnson,42,bob.johnson@example.com
- Run a simple query using the
csvq
command:
csvq "SELECT name, age FROM data.csv"
This will output the name
and age
columns from the data.csv
file.
- Explore the csvq documentation to learn more about the available features and capabilities.
Competitor Comparisons
ClickHouse Java Clients & JDBC Driver
Pros of ClickHouse/clickhouse-java
- Supports a wide range of data types and features provided by ClickHouse, a high-performance analytical database.
- Provides a simple and intuitive API for interacting with ClickHouse, making it easy to integrate into Java-based applications.
- Offers advanced features like connection pooling, query execution, and result set handling.
Cons of ClickHouse/clickhouse-java
- Primarily focused on ClickHouse, limiting its usefulness for working with other database systems.
- May have a steeper learning curve compared to more generic CSV-handling libraries like csvq.
- Requires additional setup and configuration to work with ClickHouse, which may not be necessary for simpler CSV-based use cases.
Code Comparison
ClickHouse/clickhouse-java:
try (Connection connection = DriverManager.getConnection("jdbc:clickhouse://localhost:8123/default")) {
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery("SELECT * FROM table_name LIMIT 10");
while (resultSet.next()) {
// Process the result set
}
}
mithrandie/csvq:
query := "SELECT * FROM file.csv LIMIT 10"
rows, err := csvq.Query(query)
if err != nil {
// Handle the error
}
for rows.Next() {
// Process the rows
}
Apache Hive
Pros of Apache Hive
- Hive is a powerful data warehousing solution that can handle large-scale data processing and analysis.
- It provides a SQL-like interface (HiveQL) that allows users to query and manipulate data stored in a Hadoop cluster.
- Hive integrates well with other Apache projects, such as Spark and Impala, making it a versatile tool in the big data ecosystem.
Cons of Apache Hive
- Hive can be more complex to set up and configure compared to CSVQ.
- The performance of Hive queries can be slower than other big data tools, especially for small datasets.
- Hive's SQL-like syntax may not be as familiar to users who are more comfortable with traditional SQL databases.
Code Comparison
CSVQ:
SELECT name, age
FROM people
WHERE age > 30
ORDER BY age DESC
LIMIT 10;
Apache Hive:
SELECT name, age
FROM people
WHERE age > 30
ORDER BY age DESC
LIMIT 10;
The syntax for both CSVQ and Apache Hive is very similar, as they both use a SQL-like language for querying data. The main difference is that CSVQ is designed for working with CSV files, while Apache Hive is a data warehousing solution for Hadoop-based data.
Apache Spark - A unified analytics engine for large-scale data processing
Pros of Apache Spark
- Distributed processing: Spark is designed for distributed computing, allowing it to handle large-scale data processing tasks efficiently.
- Diverse ecosystem: Spark has a rich ecosystem with a wide range of libraries and tools, such as Spark SQL, Spark Streaming, and MLlib, which expand its capabilities.
- Performance: Spark's in-memory processing and optimized execution engine make it a high-performance data processing framework.
Cons of Apache Spark
- Complexity: Spark has a steeper learning curve compared to mithrandie/csvq, as it requires understanding distributed computing concepts and its various components.
- Resource requirements: Spark typically requires more computational resources, such as memory and CPU, to achieve its high performance, which can be a limitation for some use cases.
Code Comparison
Apache Spark (Scala):
val spark = SparkSession.builder()
.appName("WordCount")
.getOrCreate()
val textFile = spark.read.textFile("README.md")
val counts = textFile.flatMap(line => line.split(" "))
.groupBy("value")
.count()
counts.show()
mithrandie/csvq (Go):
q := csvq.New()
q.Query("SELECT * FROM data.csv")
rows, err := q.Exec()
if err != nil {
panic(err)
}
for _, row := range rows {
fmt.Println(row)
}
Apache Drill is a distributed MPP query layer for self describing data
Pros of Apache Drill
- Apache Drill is a distributed SQL query engine that can query a variety of data sources, including CSV files, making it a more versatile tool compared to mithrandie/csvq.
- Drill supports a wide range of data formats and data sources, including NoSQL databases, cloud storage, and more, allowing for more flexible data analysis.
- Drill's distributed architecture allows for scalable and high-performance data processing, making it suitable for large-scale data analysis tasks.
Cons of Apache Drill
- Apache Drill has a steeper learning curve compared to mithrandie/csvq, as it requires more setup and configuration to get started.
- Drill's complexity and feature-richness may be overkill for users who only need to work with CSV files, making mithrandie/csvq a simpler and more lightweight alternative.
- Drill's performance may be affected by the complexity of the data sources and queries, whereas mithrandie/csvq is optimized for CSV file processing.
Code Comparison
mithrandie/csvq:
func (c *CSV) Query(query string) (*Result, error) {
parser := NewParser(query)
stmt, err := parser.Parse()
if err != nil {
return nil, err
}
return c.ExecuteStatement(stmt)
}
Apache Drill:
public class DrillClient {
private DrillConfig config;
private DrillClient(DrillConfig config) {
this.config = config;
}
public static DrillClient create(DrillConfig config) {
return new DrillClient(config);
}
public QuerySubmitter querySubmitter() {
return new QuerySubmitter(this.config);
}
}
Apache Kylin
Pros of Apache Kylin
- Kylin is a distributed OLAP engine that can handle large-scale data processing and analysis.
- It provides a SQL-based interface for querying data, making it accessible to a wide range of users.
- Kylin supports a variety of data sources, including Hadoop, Hive, and Spark, making it a versatile tool.
Cons of Apache Kylin
- Kylin has a steeper learning curve compared to CSVQ, as it requires more setup and configuration.
- The deployment and maintenance of Kylin can be more complex, especially in large-scale environments.
- Kylin may have higher resource requirements (CPU, memory, storage) compared to CSVQ, depending on the scale of the data being processed.
Code Comparison
CSVQ:
func main() {
csvq.Run()
}
Apache Kylin:
public class KylinApplication {
public static void main(String[] args) {
SpringApplication.run(KylinApplication.class, args);
}
}
Apache Impala
Pros of Impala
- Impala is a high-performance, massively parallel processing (MPP) SQL query engine for data stored in a computer cluster.
- Impala supports a wide range of data formats, including Parquet, Avro, and ORC, making it a versatile tool for working with large datasets.
- Impala is designed to be highly scalable, allowing it to handle large amounts of data and high query loads.
Cons of Impala
- Impala is a complex and resource-intensive system, requiring a significant investment in hardware and infrastructure to set up and maintain.
- Impala's learning curve can be steep, as it requires a deep understanding of distributed systems and SQL optimization.
Code Comparison
CSVQ:
func main() {
csvq.Run(os.Args[1:])
}
Impala:
public class Main {
public static void main(String[] args) {
// Impala initialization and query execution code
}
}
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
csvq
SQL-like query language for csv
Csvq is a command line tool to operate CSV files. You can read, update, delete CSV records with SQL-like query.
You can also execute multiple operations sequentially in managed transactions by passing a procedure or using the interactive shell. In the multiple operations, you can use variables, cursors, temporary tables, and other features.
Latest Release
Intended Use
Csvq is intended for one-time queries and routine processing described in source files on the amount of data that can be handled by spreadsheet applications.
It is not suitable for handling very large data since all data is kept on memory when queries are executed. There is no indexing, calculation order optimization, etc., and the execution speed is not fast due to the inclusion of mechanisms for updating data and handling various other features.
However, it can be run with a single executable binary, and you don't have to worry about troublesome dependencies during installation. You can not only write and run your own queries, but also share source files with co-workers on multiple platforms.
This tool may be useful for those who want to handle data easily and roughly, without having to think about troublesome matters.
Features
- CSV File Operation
- Select Query
- Insert Query
- Update Query
- Replace Query
- Delete Query
- Create Table Query
- Alter Table Query
- Cursor
- Temporary Table
- Transaction Management
- Support loading data from Standard Input
- Support following file formats
- CSV
- TSV
- LTSV
- Fixed-Length Format
- JSON
- JSON Lines
- Support following file encodings
-
UTF-8
-
UTF-16
-
Shift_JIS
JSON and JSON Lines formats support only UTF-8.
-
Reference Manual
Installation
Install executable binary
- Download an archive file from release page.
- Extract the downloaded archive and add a binary file in it to your path.
Build from source
Requirements
Go 1.18 or later (cf. Getting Started - The Go Programming Language)
Build command
$ go install github.com/mithrandie/csvq
Install using package manager
Installing using a package manager does not ensure that you always get the latest version, but it may make installation and updating easier.
Ubuntu
$ sudo add-apt-repository ppa:mithrandie/csvq
$ sudo apt update
$ sudo apt install csvq
Arch Linux (unofficial)
Install the csvq-git or csvq-bin from the Arch User Repository (e.g. yay -S csvq-git
)
macOS (unofficial)
- Install homebrew (cf. The missing package manager for macOS (or Linux) â Homebrew)
$ brew install csvq
Usage
# Simple query
csvq 'select id, name from `user.csv`'
csvq 'select id, name from user'
# Specify data delimiter as tab character
csvq -d '\t' 'select count(*) from `user.csv`'
# Load no-header-csv
csvq --no-header 'select c1, c2 from user'
# Load from redirection or pipe
csvq 'select * from stdin' < user.csv
cat user.csv | csvq 'select *'
# Load from Fixed-Length Format
cat /var/log/syslog | csvq -n -i fixed -m '[15, 24, 124]' 'select *'
# Split lines with spaces automatically
ps | csvq -i fixed -m spaces 'select * from stdin'
# Output in JSON format
csvq -f json 'select integer(id) as id, name from user'
# Output to a file
csvq -o new_user.csv 'select id, name from user'
# Load statements from file
$ cat statements.sql
VAR @id := 0;
SELECT @id := @id + 1 AS id,
name
FROM user;
$ csvq -s statements.sql
# Execute statements in the interactive shell
$ csvq
csvq > UPDATE users SET name = 'Mildred' WHERE id = 2;
1 record updated on "/home/mithrandie/docs/csv/users.csv".
csvq > COMMIT;
Commit: file "/home/mithrandie/docs/csv/users.csv" is updated.
csvq > EXIT;
# Show help
csvq -h
More details >> https://mithrandie.github.io/csvq
Execute csvq statements in Go
Example of cooperation with other applications
Top Related Projects
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot