Top Related Projects
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Open source platform for the machine learning lifecycle
Quick Overview
The Kaggle API is an official Python package that allows users to interact programmatically with Kaggle, a popular platform for data science competitions and datasets. It provides a command-line interface and Python library for accessing and managing Kaggle resources, including datasets, competitions, and kernels.
Pros
- Easy integration with existing data science workflows and scripts
- Enables automation of common Kaggle tasks, such as dataset downloads and competition submissions
- Provides a convenient way to access Kaggle's vast repository of datasets and competitions programmatically
- Supports both command-line and Python library usage for flexibility
Cons
- Limited to Python, which may not be ideal for users of other programming languages
- Requires API credentials, which need to be set up and managed securely
- Some advanced Kaggle features may not be fully supported or may have limited functionality
- Documentation could be more comprehensive for some less common use cases
Code Examples
- Downloading a dataset:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files('zillow/zecon', path='./data')
- Submitting to a competition:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.competition_submit('path/to/submission.csv', 'Submission message', 'titanic')
- Listing available datasets:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
datasets = api.dataset_list(search='covid')
for dataset in datasets:
print(f"{dataset.ref}: {dataset.title}")
Getting Started
-
Install the Kaggle API:
pip install kaggle
-
Set up your API credentials:
- Go to your Kaggle account settings (https://www.kaggle.com/account)
- Click on "Create New API Token" to download
kaggle.json
- Place
kaggle.json
in~/.kaggle/
on Linux/macOS orC:\Users\<Windows-username>\.kaggle\
on Windows
-
Use the API in your Python script:
from kaggle.api.kaggle_api_extended import KaggleApi api = KaggleApi() api.authenticate() # Now you can use api.dataset_download_files(), api.competition_submit(), etc.
Competitor Comparisons
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Pros of nni
- Broader scope: Supports various ML tasks beyond just Kaggle competitions
- More advanced features: Includes neural architecture search and model compression
- Flexible deployment: Can be used locally or on cloud platforms
Cons of nni
- Steeper learning curve: More complex to set up and use compared to kaggle-api
- Less focused: Not specifically tailored for Kaggle competitions
- Requires more configuration: May need more setup time for specific tasks
Code Comparison
kaggle-api:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files('dataset-name')
nni:
import nni
@nni.trace
def model_fn(params):
# Your model definition here
return model
nni.run(model_fn, nni.create_config('config.yml'))
The kaggle-api code focuses on downloading datasets, while nni code demonstrates setting up an experiment for hyperparameter tuning or neural architecture search. This reflects the different purposes and scopes of the two libraries.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Pros of wandb
- More comprehensive experiment tracking and visualization tools
- Supports a wider range of ML frameworks and integrations
- Offers collaborative features for team-based projects
Cons of wandb
- Steeper learning curve for beginners
- Requires more setup and configuration compared to Kaggle API
Code Comparison
wandb:
import wandb
wandb.init(project="my-project")
wandb.config.hyperparameters = {
"learning_rate": 0.01,
"epochs": 100
}
model.fit(X, y)
wandb.log({"accuracy": accuracy, "loss": loss})
Kaggle API:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files('dataset-name')
api.competition_submit('submission.csv', 'Submission message', 'competition-name')
The wandb code snippet demonstrates experiment tracking and logging, while the Kaggle API code focuses on dataset management and competition submissions. wandb offers more detailed experiment monitoring, while Kaggle API provides simpler access to competition-related functionalities.
Open source platform for the machine learning lifecycle
Pros of MLflow
- More comprehensive ML lifecycle management (experiment tracking, model packaging, deployment)
- Supports multiple ML frameworks and languages
- Offers a web UI for experiment visualization and comparison
Cons of MLflow
- Steeper learning curve due to more complex features
- Requires more setup and infrastructure compared to Kaggle API
Code Comparison
MLflow:
import mlflow
mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()
Kaggle API:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files('dataset_name')
Key Differences
- MLflow focuses on the entire ML lifecycle, while Kaggle API primarily handles dataset and competition interactions
- MLflow offers more robust experiment tracking and model management features
- Kaggle API is simpler to use for specific Kaggle-related tasks
Use Cases
- MLflow: Best for teams managing complex ML projects across various frameworks
- Kaggle API: Ideal for data scientists participating in Kaggle competitions or working with Kaggle datasets
Both tools serve different purposes in the ML ecosystem, with MLflow being more comprehensive for ML lifecycle management and Kaggle API being specialized for Kaggle-specific interactions.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Kaggle API
Official API for https://www.kaggle.com, accessible using a command line tool implemented in Python 3.
Installation
Ensure you have Python 3 and the package manager pip
installed.
Run the following command to access the Kaggle API using the command line:
pip install kaggle
Development
Kaggle Internal
Obviously, this depends on Kaggle services. When you're extending the API and modifying
or adding to those services, you should be working in your Kaggle mid-tier development
environment. You'll run Kaggle locally, in the container, and test the Python code by
running it in the container so it can connect to your local testing environment.
However, do not try to create a release from within the container. The code formatter
(yapf3
) changes much more than intended.
Also, run the following command to get autogen.sh
installed:
rm -rf /tmp/autogen && mkdir -p /tmp/autogen && unzip -qo /tmp/autogen.zip -d /tmp/autogen &&
mv /tmp/autogen/autogen-*/* /tmp/autogen && rm -rf /tmp/autogen/autogen-* &&
sudo chmod a+rx /tmp/autogen/autogen.sh
Prerequisites
We use hatch to manage this project.
Follow these instructions to install it.
If you are working in a managed environment, you may want to use pipx
. If it isn't already installed
try sudo apt install pipx
. Then you should be able to proceed with pipx install hatch
.
Dependencies
hatch run install-deps
Compile
hatch run compile
The compiled files are generated in the kaggle/
directory from the src/
directory.
All the changes must be done in the src/
directory.
Run
You can also run the code in python directly:
hatch run python
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.model_list_cli()
Next Page Token = [...]
[...]
Or in a single command:
hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()"
Example
Let's change the model_list_cli
method in the source file:
⯠git diff src/kaggle/api/kaggle_api_extended.py
[...]
+ print('hello Kaggle CLI update')^M
models = self.model_list(sort_by, search, owner, page_size, page_token)
[...]
⯠hatch run compile
[...]
⯠hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()"
hello Kaggle CLI update
Next Page Token = [...]
Integration Tests
To run integration tests on your local machine, you need to set up your Kaggle API credentials. You can do this in one of these two ways described this doc. Refer to the sections:
- Using environment variables
- Using credentials file
After setting up your credentials by any of these methods, you can run the integration tests as follows:
# Run all tests
hatch run integration-test
License
The Kaggle API is released under the Apache 2.0 license.
Top Related Projects
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Open source platform for the machine learning lifecycle
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot