kaggle-api

Official Kaggle API

6,692

1,177

6,692

140

View on GitHub

Top Related Projects

nni

14,164

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

wandb

9,810

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

mlflow

20,329

Open source platform for the machine learning lifecycle

Quick Overview

The Kaggle API is an official Python package that allows users to interact programmatically with Kaggle, a popular platform for data science competitions and datasets. It provides a command-line interface and Python library for accessing and managing Kaggle resources, including datasets, competitions, and kernels.

Pros

Easy integration with existing data science workflows and scripts
Enables automation of common Kaggle tasks, such as dataset downloads and competition submissions
Provides a convenient way to access Kaggle's vast repository of datasets and competitions programmatically
Supports both command-line and Python library usage for flexibility

Cons

Limited to Python, which may not be ideal for users of other programming languages
Requires API credentials, which need to be set up and managed securely
Some advanced Kaggle features may not be fully supported or may have limited functionality
Documentation could be more comprehensive for some less common use cases

Code Examples

Downloading a dataset:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

api.dataset_download_files('zillow/zecon', path='./data')

Submitting to a competition:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

api.competition_submit('path/to/submission.csv', 'Submission message', 'titanic')

Listing available datasets:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

datasets = api.dataset_list(search='covid')
for dataset in datasets:
    print(f"{dataset.ref}: {dataset.title}")

Getting Started

Install the Kaggle API:
```
pip install kaggle
```
Set up your API credentials:
- Go to your Kaggle account settings (https://www.kaggle.com/account)
- Click on "Create New API Token" to download kaggle.json
- Place kaggle.json in ~/.kaggle/ on Linux/macOS or C:\Users\<Windows-username>\.kaggle\ on Windows

Use the API in your Python script:

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
# Now you can use api.dataset_download_files(), api.competition_submit(), etc.

Competitor Comparisons

nni

14,164

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Pros of nni

Broader scope: Supports various ML tasks beyond just Kaggle competitions
More advanced features: Includes neural architecture search and model compression
Flexible deployment: Can be used locally or on cloud platforms

Cons of nni

Steeper learning curve: More complex to set up and use compared to kaggle-api
Less focused: Not specifically tailored for Kaggle competitions
Requires more configuration: May need more setup time for specific tasks

Code Comparison

kaggle-api:

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files('dataset-name')

nni:

import nni
@nni.trace
def model_fn(params):
    # Your model definition here
    return model

nni.run(model_fn, nni.create_config('config.yml'))

The kaggle-api code focuses on downloading datasets, while nni code demonstrates setting up an experiment for hyperparameter tuning or neural architecture search. This reflects the different purposes and scopes of the two libraries.

wandb

9,810

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Pros of wandb

More comprehensive experiment tracking and visualization tools
Supports a wider range of ML frameworks and integrations
Offers collaborative features for team-based projects

Cons of wandb

Steeper learning curve for beginners
Requires more setup and configuration compared to Kaggle API

Code Comparison

wandb:

import wandb

wandb.init(project="my-project")
wandb.config.hyperparameters = {
    "learning_rate": 0.01,
    "epochs": 100
}
model.fit(X, y)
wandb.log({"accuracy": accuracy, "loss": loss})

Kaggle API:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()
api.dataset_download_files('dataset-name')
api.competition_submit('submission.csv', 'Submission message', 'competition-name')

The wandb code snippet demonstrates experiment tracking and logging, while the Kaggle API code focuses on dataset management and competition submissions. wandb offers more detailed experiment monitoring, while Kaggle API provides simpler access to competition-related functionalities.

mlflow

20,329

Open source platform for the machine learning lifecycle

Pros of MLflow

More comprehensive ML lifecycle management (experiment tracking, model packaging, deployment)
Supports multiple ML frameworks and languages
Offers a web UI for experiment visualization and comparison

Cons of MLflow

Steeper learning curve due to more complex features
Requires more setup and infrastructure compared to Kaggle API

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()

Kaggle API:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()
api.dataset_download_files('dataset_name')

Key Differences

MLflow focuses on the entire ML lifecycle, while Kaggle API primarily handles dataset and competition interactions
MLflow offers more robust experiment tracking and model management features
Kaggle API is simpler to use for specific Kaggle-related tasks

Use Cases

MLflow: Best for teams managing complex ML projects across various frameworks
Kaggle API: Ideal for data scientists participating in Kaggle competitions or working with Kaggle datasets

Both tools serve different purposes in the ML ecosystem, with MLflow being more comprehensive for ML lifecycle management and Kaggle API being specialized for Kaggle-specific interactions.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Kaggle API

Official API for https://www.kaggle.com, accessible using a command line tool implemented in Python 3.

User documentation

Installation

Ensure you have Python 3 and the package manager pip installed.

Run the following command to access the Kaggle API using the command line:

pip install kaggle

Development

Kaggle Internal

Obviously, this depends on Kaggle services. When you're extending the API and modifying or adding to those services, you should be working in your Kaggle mid-tier development environment. You'll run Kaggle locally, in the container, and test the Python code by running it in the container so it can connect to your local testing environment.

Also, run the following command to get autogen.sh installed:

rm -rf /tmp/autogen && mkdir -p /tmp/autogen && unzip -qo /tmp/autogen.zip -d /tmp/autogen &&
mv /tmp/autogen/autogen-*/* /tmp/autogen && rm -rf /tmp/autogen/autogen-* &&
sudo chmod a+rx /tmp/autogen/autogen.sh

Prerequisites

We use hatch to manage this project.

Follow these instructions to install it.

If you are working in a managed environment, you may want to use pipx. If it isn't already installed try sudo apt install pipx. Then you should be able to proceed with pipx install hatch.

Dependencies

hatch run install-deps

Compile

hatch run compile

The compiled files are generated in the kaggle/ directory from the src/ directory.

All the changes must be done in the src/ directory.

Run

Use hatch run install to compile the program and install it in the default hatch environment. To run that version locally for testing, use hatch: hatch run kaggle -v. If you'd rather not type hatch run every time, launch a new shell in the hatch environment: hatch shell.

You can also run the code in python directly:

hatch run python

import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.model_list_cli()

Next Page Token = [...]
[...]

Or in a single command:

hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()"

Example

Let's change the model_list_cli method in the source file:

â¯ git diff src/kaggle/api/kaggle_api_extended.py
[...]
+        print('hello Kaggle CLI update')^M
         models = self.model_list(sort_by, search, owner, page_size, page_token)
[...]

â¯ hatch run compile
[...]

â¯ hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()"
hello Kaggle CLI update
Next Page Token = [...]

Integration Tests

To run integration tests on your local machine, you need to set up your Kaggle API credentials. You can do this in one of these two ways described this doc. Refer to the sections:

Using environment variables
Using credentials file

After setting up your credentials by any of these methods, you can run the integration tests as follows:

# Run all tests
hatch run integration-test

License

The Kaggle API is released under the Apache 2.0 license.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot