querybook

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.

2,132

261

2,132

178

View on GitHub

Top Related Projects

superset

67,369

Apache Superset is a Data Visualization and Data Exploration Platform

metabase

43,074

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

redash

27,617

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

zeppelin

6,535

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

falcon

5,118

Free, open-source SQL client for Windows and Mac 🦅

Quick Overview

Querybook is an open-source big data IDE developed by Pinterest. It provides a web-based interface for writing, running, and sharing SQL queries across multiple data engines. Querybook aims to improve data discovery, collaboration, and analysis within organizations.

Pros

Supports multiple data engines (e.g., Presto, Hive, Snowflake)
Offers collaborative features like query sharing and version control
Provides a user-friendly interface with syntax highlighting and auto-completion
Includes data lineage and metadata exploration capabilities

Cons

Requires significant setup and configuration for enterprise use
May have a learning curve for users new to big data querying
Limited customization options compared to some proprietary solutions
Dependency on specific backend technologies may limit flexibility

Getting Started

To set up Querybook locally:

Clone the repository:

git clone https://github.com/pinterest/querybook.git
cd querybook

Install dependencies:

pip install -r requirements.txt
npm install

Set up the database:
```
python querybook/scripts/init_db.py
```
Start the development server:
```
python querybook/scripts/runserver.py
```
Access Querybook at http://localhost:10001 in your web browser.

For detailed installation and configuration instructions, refer to the project's documentation.

Competitor Comparisons

superset

67,369

Apache Superset is a Data Visualization and Data Exploration Platform

Pros of Superset

More mature and widely adopted project with a larger community and ecosystem
Offers a broader range of visualization types and chart options
Provides more advanced features for data exploration and dashboard creation

Cons of Superset

Steeper learning curve and more complex setup process
Requires more system resources and may be overkill for simpler use cases
Less focused on collaborative query editing and sharing compared to Querybook

Code Comparison

Superset (Python):

from superset import db
from superset.models.slice import Slice

slice = Slice(
    slice_name="My Chart",
    datasource_type="table",
    datasource_id=1,
    viz_type="bar",
    params="{}"
)
db.session.add(slice)
db.session.commit()

Querybook (TypeScript):

import { QueryExecutionAPI } from 'lib/api/QueryExecutionAPI';

const executeQuery = async (queryId: number) => {
  const result = await QueryExecutionAPI.executeQuery(queryId);
  return result.data;
};

The code snippets demonstrate different aspects of each project. Superset's example shows creating a chart using its data model, while Querybook's example focuses on executing a query through its API.

metabase

43,074

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

Pros of Metabase

More mature and widely adopted project with a larger community
Offers a user-friendly interface for non-technical users to create visualizations
Supports a wider range of databases and data sources out-of-the-box

Cons of Metabase

Less focused on collaborative query editing and version control
May be more resource-intensive for large-scale deployments
Limited customization options for advanced users compared to Querybook

Code Comparison

Metabase (Clojure):

(defn run-query
  [query]
  (let [database (db/select-one Database :id (:database query))
        driver   (driver/database-id->driver (:id database))]
    (driver/execute-query driver query)))

Querybook (Python):

def run_query(query, engine):
    with engine.connect() as connection:
        result = connection.execute(query)
        return result.fetchall()

Both projects handle query execution, but Metabase's implementation is more abstracted and supports multiple database drivers, while Querybook's approach is more straightforward and relies on SQLAlchemy for database connections.

redash

27,617

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Pros of Redash

More mature project with a larger community and wider adoption
Supports a broader range of data sources out-of-the-box
Offers a more polished and user-friendly interface for non-technical users

Cons of Redash

Less focus on collaboration features compared to Querybook
May require more setup and configuration for complex environments
Limited built-in version control for queries

Code Comparison

Redash query execution:

query_runner = get_query_runner(data_source.type, data_source.options)
data, error = query_runner.run_query(query, user)

Querybook query execution:

engine = get_query_engine(data_source)
result = engine.execute_query(query, user)

Both projects use similar approaches for query execution, with slight differences in naming conventions and method signatures. Redash's implementation appears more straightforward, while Querybook's may offer more flexibility in terms of engine customization.

zeppelin

6,535

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Pros of Zeppelin

More mature project with a larger community and ecosystem
Supports a wider range of programming languages and data processing frameworks
Offers built-in visualization tools for data exploration

Cons of Zeppelin

Heavier resource requirements and more complex setup process
Steeper learning curve for new users
Less focus on collaborative features compared to Querybook

Code Comparison

Zeppelin notebook cell (Python):

%python
import pandas as pd
df = pd.read_csv('data.csv')
df.head()

Querybook query cell (SQL):

SELECT *
FROM my_table
LIMIT 5;

While both tools support various languages, Zeppelin uses a % syntax to specify the interpreter, whereas Querybook is primarily focused on SQL queries. Zeppelin's notebooks can include multiple languages and visualization blocks, while Querybook is more streamlined for SQL-based data exploration and collaboration.

Zeppelin offers a more comprehensive data science platform with support for multiple languages and built-in visualizations. Querybook, on the other hand, provides a more focused and user-friendly experience for SQL-based data exploration and collaboration within organizations.

falcon

5,118

Free, open-source SQL client for Windows and Mac 🦅

Pros of Falcon

Built with React and TypeScript, offering a modern and type-safe frontend development experience
Focuses on interactive data visualization and dashboarding capabilities
Provides a more customizable and extensible architecture for building data applications

Cons of Falcon

Less emphasis on collaborative features and team-oriented workflows
May require more setup and configuration compared to Querybook's out-of-the-box solution
Smaller community and fewer resources available for support and troubleshooting

Code Comparison

Querybook (Python):

from querybook.models import DataDoc

data_doc = DataDoc.create(
    title="My Data Document",
    owner_uid=user.id,
    environment_id=environment.id
)

Falcon (TypeScript):

import { Dashboard } from '@plotly/falcon';

const dashboard = new Dashboard({
  title: 'My Dashboard',
  layout: 'grid',
  items: [
    { type: 'chart', query: 'SELECT * FROM users' }
  ]
});

Both repositories offer solutions for data exploration and analysis, but they cater to different use cases. Querybook is more focused on collaborative SQL editing and execution, while Falcon emphasizes interactive data visualization and dashboard creation. The choice between the two depends on specific project requirements and team preferences.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Querybook

Querybook is a Big Data IDE that allows you to discover, create, and share data analyses, queries, and tables. Check out the full documentation & feature highlights here.

Features

ð Organize analyses with rich text, queries, and charts
âï¸ Compose queries with autocompletion and hovering tooltip
ð Use scheduling + charting in DataDocs to build dashboards
ð Live query collaborations with others
ð Add additional documentation to your tables
ð§® Get lineage, sample queries, frequent user, search ranking based on past query runs

Getting started

Prerequisite

Please install Docker before trying out Querybook.

Quick setup

Pull this repo and run make. Visit http://localhost:10001 when the build completes.

For more details on installation, click here

Configuration

For infrastructure configuration, click here For general configuration, click here

Supported Integrations

Query Engines

Presto
Hive
Druid
Snowflake
Big Query
MySQL
Sqlite
PostgreSQL
and many more...

Authentication

User/Password
OAuth
- Google Cloud OAuth
- Okta OAuth
- GitHub OAuth
- Auth0 OAuth
LDAP

Metastore

Can be used to fetch schema and table information for metadata enrichment.

Hive Metastore
Sqlalchemy Inspect
AWS Glue Data Catalog

Result Storage

Use one of the following to store query results.

Database (MySQL, Postgres, etc)
S3
Google Cloud Storage
Local file

Result Export

Upload query results from Querybook to other tools for further analyses.

Google Sheets Export
Python export

Notification

Get notified upon completion of queries and DataDoc invitations via IM or email.

Email
Slack

User Interface

Query Editor

Charting

Scheduling

Lineage & Analytics

Contributing Back

See CONTRIBUTING.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot