tablib

Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.

4,680

596

4,680

View on GitHub

Top Related Projects

csvkit

6,116

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

pandas

45,255

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

xarray

3,775

N-D labeled arrays and datasets in Python

Quick Overview

Tablib is a powerful, format-agnostic tabular dataset library for Python. It allows you to import, export, and manipulate tabular data with ease, supporting various formats such as CSV, JSON, YAML, and Excel.

Pros

Supports multiple data formats (CSV, JSON, YAML, Excel, etc.)
Easy-to-use API for data manipulation and transformation
Extensible architecture allowing custom format support
Seamless integration with other Python libraries and frameworks

Cons

Limited support for large datasets (performance may degrade with very large files)
Some advanced Excel features are not fully supported
Documentation could be more comprehensive for complex use cases
Dependency on external libraries for certain formats (e.g., openpyxl for Excel)

Code Examples

Creating a dataset and adding rows:

import tablib

data = tablib.Dataset(headers=['Name', 'Age', 'Country'])
data.append(['John Doe', 30, 'USA'])
data.append(['Jane Smith', 25, 'Canada'])
print(data)

Exporting data to different formats:

# Export to CSV
csv_data = data.export('csv')

# Export to JSON
json_data = data.export('json')

# Export to Excel
excel_data = data.export('xlsx')

Importing data from a file:

with open('data.csv', 'r') as f:
    imported_data = tablib.Dataset().load(f.read(), format='csv')
print(imported_data)

Getting Started

To get started with Tablib, first install it using pip:

pip install tablib

Then, you can create a simple dataset and manipulate it:

import tablib

# Create a dataset
data = tablib.Dataset(headers=['Name', 'Age'])
data.append(['Alice', 28])
data.append(['Bob', 32])

# Add a column
data.append_col([True, False], header='Active')

# Export to CSV
csv_output = data.export('csv')
print(csv_output)

This example creates a dataset, adds some rows and a column, and then exports it to CSV format.

Competitor Comparisons

records

7,191

SQL for Humans™

Pros of Records

Simpler API for database operations
Built-in SQL query support
Automatic connection management

Cons of Records

Limited to database operations
Less flexible data manipulation
Fewer export options

Code Comparison

Records:

import records

db = records.Database('postgresql://...')
rows = db.query('SELECT * FROM users')
print(rows.export('csv'))

Tablib:

import tablib

data = tablib.Dataset()
data.append(['John', 'Doe', 30])
data.append(['Jane', 'Smith', 25])
print(data.export('csv'))

Key Differences

Records focuses on database operations, while Tablib is for general data manipulation
Tablib offers more export formats (JSON, YAML, XLS, etc.)
Records provides a more streamlined approach for working with databases
Tablib allows for more flexible data structuring and manipulation

Use Cases

Records is ideal for:

Quick database queries and exports
Simple data analysis from databases

Tablib is better for:

Working with various data formats
Creating and manipulating datasets from multiple sources
Exporting data to multiple formats

Both libraries have their strengths, and the choice depends on the specific requirements of your project.

csvkit

6,116

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

Pros of csvkit

Comprehensive command-line toolkit for CSV operations
Supports a wide range of CSV manipulations and transformations
Integrates well with Unix-style pipelines and shell scripts

Cons of csvkit

Limited support for other data formats beyond CSV
Steeper learning curve for users unfamiliar with command-line tools
Less suitable for programmatic use within Python applications

Code Comparison

csvkit (command-line usage):

csvcut -c 1,3 data.csv | csvgrep -c 1 -m "pattern" | csvsort -c 3

tablib (Python code):

import tablib
data = tablib.Dataset().load(open('data.csv').read())
filtered = data.filter(lambda row: 'pattern' in row[0])
sorted_data = filtered.sort('column3')

Summary

csvkit excels in command-line CSV processing, offering powerful tools for data manipulation. It's ideal for shell scripting and quick data transformations. tablib, on the other hand, provides a more Pythonic approach to working with tabular data, supporting multiple formats and offering an intuitive API for in-memory data manipulation. While csvkit is more specialized for CSV operations, tablib offers greater flexibility in handling various data formats and integrating with Python applications.

pandas

45,255

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Pros of pandas

More comprehensive data manipulation and analysis capabilities
Highly optimized for performance with large datasets
Extensive documentation and community support

Cons of pandas

Steeper learning curve for beginners
Larger library size and memory footprint
Can be overkill for simple data tasks

Code Comparison

tablib example:

import tablib

data = tablib.Dataset()
data.headers = ['Name', 'Age']
data.append(['Alice', 30])
data.append(['Bob', 25])
print(data.export('csv'))

pandas example:

import pandas as pd

df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
print(df.to_csv(index=False))

Both libraries allow for easy data manipulation and export, but pandas offers more advanced features for complex data analysis tasks. tablib is more lightweight and focused on simple data operations, making it easier to use for basic tasks. pandas excels in handling large datasets and provides powerful tools for data cleaning, transformation, and analysis, but may be excessive for simpler use cases.

xarray

3,775

N-D labeled arrays and datasets in Python

Pros of xarray

Designed for working with multi-dimensional labeled arrays and datasets
Powerful data analysis capabilities, especially for scientific and geospatial data
Integrates well with other scientific Python libraries like NumPy and pandas

Cons of xarray

Steeper learning curve due to more complex data structures and operations
May be overkill for simple tabular data tasks
Larger library size and potentially slower performance for basic operations

Code Comparison

xarray:

import xarray as xr

data = xr.DataArray(
    [[1, 2, 3], [4, 5, 6]],
    dims=("x", "y"),
    coords={"x": [10, 20], "y": [100, 200, 300]}
)
result = data.mean(dim="y")

tablib:

import tablib

data = tablib.Dataset()
data.append([1, 2, 3])
data.append([4, 5, 6])
data.headers = ['A', 'B', 'C']
result = data.export('csv')

xarray is more suitable for complex, multi-dimensional data analysis, while tablib excels at simple tabular data manipulation and export to various formats. xarray offers more advanced features but requires more setup, whereas tablib provides a straightforward interface for basic data handling tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Tablib: format-agnostic tabular dataset library

_____         ______  ___________ ______
__  /_______ ____  /_ ___  /___(_)___  /_
_  __/_  __ `/__  __ \__  / __  / __  __ \
/ /_  / /_/ / _  /_/ /_  /  _  /  _  /_/ /
\__/  \__,_/  /_.___/ /_/   /_/   /_.___/

Tablib is a format-agnostic tabular dataset library, written in Python.

Output formats supported:

Excel (Sets + Books)
JSON (Sets + Books)
YAML (Sets + Books)
Pandas DataFrames (Sets)
HTML (Sets)
Jira (Sets)
LaTeX (Sets)
TSV (Sets)
ODS (Sets)
CSV (Sets)
DBF (Sets)

Note that tablib purposefully excludes XML support. It always will. (Note: This is a joke. Pull requests are welcome.)

Tablib documentation is graciously hosted on https://tablib.readthedocs.io

It is also available in the docs directory of the source distribution.

Make sure to check out Tablib on PyPI!

Contribute

Please see the contributing guide.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot