Top Related Projects
Quick Overview
Airpal is a web-based query execution tool built on top of Facebook's PrestoDB. It provides a user-friendly interface for data analysts and scientists to run SQL queries, explore data, and collaborate on data analysis tasks. Airpal aims to make PrestoDB more accessible and easier to use for non-technical users.
Pros
- User-friendly web interface for running SQL queries
- Built-in collaboration features for sharing queries and results
- Supports saving and scheduling queries for automated reporting
- Integrates well with existing PrestoDB deployments
Cons
- Limited to PrestoDB as the underlying query engine
- May require additional setup and maintenance compared to using PrestoDB directly
- Less flexible than raw SQL for advanced users
- Development appears to have slowed down in recent years
Getting Started
To get started with Airpal, follow these steps:
-
Clone the repository:
git clone https://github.com/airbnb/airpal.git
-
Build the project:
cd airpal ./gradlew clean shadowJar
-
Configure Airpal by creating a
reference.yml
file based on the provided example:cp reference.example.yml reference.yml
-
Edit
reference.yml
to set up your PrestoDB connection and other settings. -
Run Airpal:
java -server -Duser.timezone=UTC -cp build/libs/airpal-*-all.jar com.airbnb.airpal.AirpalApplication server reference.yml
-
Access the Airpal web interface at
http://localhost:8081
(or the configured port).
Note: Ensure you have Java and Gradle installed on your system before starting.
Competitor Comparisons
The official home of the Presto distributed SQL query engine for big data
Pros of Presto
- More powerful and flexible SQL engine for big data analytics
- Supports a wider range of data sources and connectors
- Active development with frequent updates and improvements
Cons of Presto
- Steeper learning curve and more complex setup
- Requires more resources to run effectively
- Less user-friendly interface for non-technical users
Code Comparison
Airpal (JavaScript):
var AirpalApp = React.createClass({
render: function() {
return (
<div className="airpal-app">
<Header />
<ExecutionController />
</div>
);
}
});
Presto (Java):
public class PrestoServer
implements Runnable
{
public static void main(String[] args)
throws Exception
{
new PrestoServer().run();
}
}
Presto offers a more robust SQL engine with broader data source support, making it suitable for complex big data analytics. However, it requires more technical expertise and resources to set up and maintain. Airpal provides a more user-friendly interface for running Presto queries but with limited functionality compared to Presto's full capabilities. The code comparison shows Airpal's focus on the frontend user interface, while Presto's core is implemented in Java for performance and scalability.
Apache Hive
Pros of Hive
- More comprehensive data warehousing solution with broader functionality
- Supports multiple data formats and storage systems
- Larger community and ecosystem with extensive documentation
Cons of Hive
- Steeper learning curve and more complex setup
- Slower query performance for small to medium-sized datasets
- Requires more resources and maintenance
Code Comparison
Hive SQL query:
SELECT customer_id, SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(order_total) > 1000;
Airpal query (using Presto SQL):
SELECT customer_id, SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(order_total) > 1000;
Key Differences
- Hive is a full-fledged data warehousing solution, while Airpal is a web-based query interface for Presto
- Hive supports multiple storage systems, whereas Airpal is primarily designed for querying data in Hadoop
- Hive has a more complex architecture, while Airpal focuses on simplifying the query process for end-users
- Hive offers more advanced features like indexing and partitioning, while Airpal emphasizes ease of use and collaboration
Both tools have their strengths, with Hive being more suitable for large-scale data processing and Airpal excelling in user-friendly ad-hoc querying.
Apache Spark - A unified analytics engine for large-scale data processing
Pros of Spark
- Powerful distributed computing framework for big data processing
- Supports multiple programming languages (Scala, Java, Python, R)
- Extensive ecosystem with libraries for machine learning, graph processing, and streaming
Cons of Spark
- Steeper learning curve and more complex setup compared to Airpal
- Requires more resources and infrastructure to run effectively
- May be overkill for simpler data analysis tasks
Code Comparison
Airpal (SQL query):
SELECT user_id, COUNT(*) as booking_count
FROM bookings
WHERE booking_date >= '2023-01-01'
GROUP BY user_id
HAVING booking_count > 5
Spark (PySpark):
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("BookingAnalysis").getOrCreate()
df = spark.read.table("bookings")
result = df.filter(df.booking_date >= '2023-01-01') \
.groupBy("user_id") \
.count() \
.filter("count > 5")
While Airpal focuses on SQL queries for data analysis, Spark offers a more programmatic approach with support for complex data processing pipelines and advanced analytics. Airpal is more user-friendly for SQL-based querying, while Spark provides greater flexibility and scalability for large-scale data processing tasks.
Apache Drill is a distributed MPP query layer for self describing data
Pros of Drill
- Supports a wider range of data sources, including Hadoop, NoSQL, and cloud storage
- Offers more advanced query capabilities with SQL-like syntax
- Provides better scalability for large-scale data processing
Cons of Drill
- Steeper learning curve and more complex setup compared to Airpal
- Requires more system resources and infrastructure
- Less user-friendly interface for non-technical users
Code Comparison
Drill query example:
SELECT * FROM dfs.`/path/to/data/file.json`
WHERE age > 30
LIMIT 10;
Airpal query example:
SELECT * FROM hive.default.users
WHERE age > 30
LIMIT 10;
Key Differences
- Query Language: Drill uses a SQL-like syntax with extensions for nested data, while Airpal primarily uses Presto SQL.
- Data Sources: Drill supports a broader range of data sources, including schema-less data, while Airpal focuses on Presto-compatible sources.
- User Interface: Airpal provides a more user-friendly web interface, whereas Drill is primarily command-line driven with optional web console.
- Performance: Drill is designed for high-performance querying of large-scale datasets, while Airpal is more suited for interactive queries on smaller to medium-sized datasets.
- Community and Support: Drill has a larger community and more extensive documentation as an Apache project, while Airpal has a smaller but active community centered around Airbnb's use case.
Apache Impala
Pros of Impala
- Highly scalable and performant SQL query engine for Hadoop
- Supports a wide range of data formats and storage systems
- Offers low-latency queries on large datasets
Cons of Impala
- Requires more complex setup and maintenance
- Limited to Hadoop ecosystem
- May have higher resource requirements
Code Comparison
Airpal (JavaScript):
var AirpalApp = React.createClass({
render: function() {
return (
<div className="airpal-app">
<Header />
<ExecutionBar />
<TabArea />
</div>
);
}
});
Impala (C++):
Status ImpalaServer::ExecutePlannedStmt(
const TQueryCtx& query_ctx,
shared_ptr<SessionState> session_state,
const TExecRequest& exec_request,
TExecResult* exec_result) {
// Implementation details
}
Airpal is a web-based query interface for PrestoDB, while Impala is a distributed SQL query engine. Airpal focuses on user-friendly query execution and visualization, whereas Impala emphasizes high-performance analytics on large-scale data. Airpal's codebase is primarily JavaScript for the frontend, while Impala is implemented in C++ for optimal performance. The choice between them depends on specific use cases, existing infrastructure, and performance requirements.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
DEPREACTED - Airpal
Airpal is deprecated, and most functionality and feature work has been moved to SQL Lab within Apache Superset.
Airpal is a web-based, query execution tool which leverages Facebook's PrestoDB to make authoring queries and retrieving results simple for users. Airpal provides the ability to find tables, see metadata, browse sample rows, write and edit queries, then submit queries all in a web interface. Once queries are running, users can track query progress and when finished, get the results back through the browser as a CSV (download it or share it with friends). The results of a query can be used to generate a new Hive table for subsequent analysis, and Airpal maintains a searchable history of all queries run within the tool.
Features
- Optional Access Control
- Syntax highlighting
- Results exported to a CSV for download or a Hive table
- Query history for self and others
- Saved queries
- Table finder to search for appropriate tables
- Table explorer to visualize schema of table and first 1000 rows
Requirements
- Java 7 or higher
- MySQL database
- Presto 0.77 or higher
- S3 bucket (to store CSVs)
- Gradle 2.2 or higher
Steps to launch
-
Build Airpal
We'll be using Gradle to build the back-end Java code and a Node.js-based build pipeline (Browserify and Gulp) to build the front-end Javascript code.
If you have
node
andnpm
installed locally, and wish to use them, simply run:./gradlew clean shadowJar -Dairpal.useLocalNode
Otherwise,
node
andnpm
will be automatically downloaded for you by running:./gradlew clean shadowJar
Specify Presto version by
-Dairpal.prestoVersion
:./gradlew -Dairpal.prestoVersion=0.145 clean shadowJar
-
Create a MySQL database for Airpal. We recommend you call it
airpal
and will assume that for future steps. -
Create a
reference.yml
file to store your configuration options.Start by copying over the example configuration,
reference.example.yml
.cp reference.example.yml reference.yml
Then edit it to specify your MySQL credentials, and your S3 credentials if using S3 as a storage layer (Airpal defaults to local file storage, for demonstration purposes).
-
Migrate your database.
java -Duser.timezone=UTC \ -cp build/libs/airpal-*-all.jar com.airbnb.airpal.AirpalApplication db migrate reference.yml
-
Run Airpal.
java -server \ -Duser.timezone=UTC \ -cp build/libs/airpal-*-all.jar com.airbnb.airpal.AirpalApplication server reference.yml
-
Visit Airpal. Assuming you used the default settings in
reference.yml
you can now open http://localhost:8081 to use Airpal. Note that you might have to change the host, depending on where you deployed it.
Note: To override the configuration specified in reference.yml
, you may
specify certain settings on the command line in the traditional Dropwizard
fashion,
like so:
java -Ddw.prestoCoordinator=http://presto-coordinator-url.com \
-Ddw.s3AccessKey=$ACCESS_KEY \
-Ddw.s3SecretKey=$SECRET_KEY \
-Ddw.s3Bucket=airpal \
-Ddw.dataSourceFactory.url=jdbc:mysql://127.0.0.1:3306/airpal \
-Ddw.dataSourceFactory.user=airpal \
-Ddw.dataSourceFactory.password=$YOUR_PASSWORD \
-Duser.timezone=UTC \
-cp build/libs/airpal-*-all.jar db migrate reference.yml
Compatibility Chart
Airpal Version | Presto Versions Tested |
---|---|
0.1 | 0.77, 0.87, 0.145 |
In the Wild
Organizations and projects using airpal
can list themselves here.
Contributors
- Andy Kramolisch @andykram
- Harry Shoff @hshoff
- Josh Perez @goatslacker
- Spike Brehm @spikebrehm
- Stefan Vermaas @stefanvermaas
Top Related Projects
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot