Convert Figma logo to code with AI

microsoft logoqlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.

15,357
2,627
15,357
245

Top Related Projects

14,100

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

18,503

Open source platform for the machine learning lifecycle

34,860

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

10,708

A hyperparameter optimization framework

3,881

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

Best Practices on Recommendation Systems

Quick Overview

Qlib (Quantitative Library) is an AI-oriented quantitative investment platform developed by Microsoft. It aims to realize the potential of AI technologies in quantitative investment, providing a comprehensive suite of tools for data processing, model training, back-testing, and decision-making in financial markets.

Pros

  • Comprehensive ecosystem for quantitative investment research and deployment
  • Supports various machine learning models and traditional quant research methods
  • Provides high-quality financial datasets and data processing tools
  • Offers flexible and extensible architecture for customization

Cons

  • Steep learning curve for beginners in quantitative finance
  • Limited documentation for some advanced features
  • Requires significant computational resources for large-scale experiments
  • Primarily focused on the Chinese stock market, which may limit its applicability for global investors

Code Examples

  1. Initializing Qlib and loading data:
import qlib
from qlib.config import REG_CN
provider_uri = "~/.qlib/qlib_data/cn_data"
qlib.init(provider_uri=provider_uri, region=REG_CN)

from qlib.data.dataset import DatasetH
dataset = DatasetH(handler="Alpha360")
df = dataset.show()
print(df.head())
  1. Creating and training a model:
from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.data.handler import Alpha158
from qlib.utils import init_instance_by_config

market = "csi300"
benchmark = "SH000300"

data_handler_config = {
    "start_time": "2008-01-01",
    "end_time": "2020-08-01",
    "fit_start_time": "2008-01-01",
    "fit_end_time": "2014-12-31",
    "instruments": market,
}

task = {
    "model": {
        "class": "LGBModel",
        "module_path": "qlib.contrib.model.gbdt",
        "kwargs": {
            "loss": "mse",
            "colsample_bytree": 0.8879,
            "learning_rate": 0.0421,
            "subsample": 0.8789,
            "lambda_l1": 205.6999,
            "lambda_l2": 580.9768,
            "max_depth": 8,
            "num_leaves": 210,
            "num_threads": 20,
        },
    },
    "dataset": {
        "class": "DatasetH",
        "module_path": "qlib.data.dataset",
        "kwargs": {
            "handler": {
                "class": "Alpha158",
                "module_path": "qlib.contrib.data.handler",
                "kwargs": data_handler_config,
            },
            "segments": {
                "train": ("2008-01-01", "2014-12-31"),
                "valid": ("2015-01-01", "2016-12-31"),
                "test": ("2017-01-01", "2020-08-01"),
            },
        },
    },
}

model = init_instance_by_config(task["model"])
dataset = init_instance_by_config(task["dataset"])

model.fit(dataset)
  1. Back-testing a strategy:
from qlib.contrib.strategy.strategy import TopkDropoutStrategy
from qlib.contrib.evaluate import backtest as normal_backtest

strategy_config = {
    "topk": 50,
    "n_drop": 5,
}

strategy = TopkDropoutStrategy(**strategy_config)
report_normal, positions_normal = normal_backtest(pred_score, strategy, risk_degree=0.95)

print(report_normal)

Getting Started

To get started with Qlib:

  1. Install Qlib:
pip install pyqlib
  1. Initialize Qlib and download data:
import qlib
from qlib.config import REG_CN
provider_uri = "~/.qlib/q

Competitor Comparisons

14,100

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Pros of NNI

  • Broader scope: Supports various AI tasks beyond quantitative finance
  • More extensive AutoML capabilities: Includes feature engineering, hyperparameter tuning, and neural architecture search
  • Larger community and more frequent updates

Cons of NNI

  • Steeper learning curve due to its broader scope
  • May be overkill for projects focused solely on quantitative finance
  • Less specialized tools for financial modeling compared to Qlib

Code Comparison

NNI example (model definition):

def create_model():
    model = keras.Sequential([
        keras.layers.Dense(64, activation='relu', input_shape=(20,)),
        keras.layers.Dense(64, activation='relu'),
        keras.layers.Dense(1)
    ])
    model.compile(optimizer='adam', loss='mse')
    return model

Qlib example (model usage):

from qlib.contrib.model.pytorch_lstm import LSTMModel
model = LSTMModel()
dataset = DatasetH(handler)
model.fit(dataset)
pred = model.predict(dataset)

Both repositories offer powerful tools for machine learning and quantitative finance, but NNI provides a more comprehensive suite for general AI tasks, while Qlib specializes in financial applications. NNI's broader scope may require more setup time, but it offers greater flexibility for diverse projects. Qlib, on the other hand, provides a more streamlined experience for quantitative finance-specific tasks.

18,503

Open source platform for the machine learning lifecycle

Pros of MLflow

  • Broader scope: MLflow is a general-purpose ML lifecycle management platform, suitable for various ML projects and domains
  • Extensive tracking capabilities: Offers comprehensive experiment tracking, model versioning, and deployment features
  • Active community: Large user base and frequent updates, resulting in better support and resources

Cons of MLflow

  • Steeper learning curve: More complex setup and configuration due to its broader feature set
  • Less specialized: Not tailored specifically for quantitative investment tasks like Qlib

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", value1)
mlflow.log_metric("metric1", value2)
mlflow.end_run()

Qlib:

from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord

recorder = R.get_recorder()
recorder.log_params(param1=value1)
recorder.log_metrics(metric1=value2)

Both libraries offer ways to log parameters and metrics, but MLflow's API is more straightforward and widely applicable to various ML tasks. Qlib's API is more specialized for quantitative investment workflows.

34,860

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Pros of Ray

  • More general-purpose distributed computing framework, suitable for a wide range of applications beyond just quantitative finance
  • Larger and more active community, with frequent updates and contributions
  • Extensive ecosystem of libraries and tools built on top of Ray

Cons of Ray

  • Steeper learning curve due to its broader scope and more complex architecture
  • May be overkill for projects focused solely on quantitative finance tasks
  • Less specialized features for financial modeling compared to Qlib

Code Comparison

Ray example:

import ray

@ray.remote
def process_data(data):
    # Perform data processing
    return processed_result

results = ray.get([process_data.remote(d) for d in dataset])

Qlib example:

from qlib.workflow import R
from qlib.workflow.task.gen import TaskGenerator

task = TaskGenerator(
    model={"class": "LGBModel", "module_path": "qlib.contrib.model.gbdt"},
    dataset={"class": "DatasetH", "module_path": "qlib.data.dataset"},
)
R.run(task)

Both frameworks offer powerful capabilities, but Ray focuses on distributed computing across various domains, while Qlib specializes in quantitative investment tasks.

10,708

A hyperparameter optimization framework

Pros of Optuna

  • More versatile, supporting a wide range of machine learning tasks beyond quantitative finance
  • Larger community and more frequent updates, leading to better support and documentation
  • Offers advanced features like pruning and parallel optimization

Cons of Optuna

  • Less specialized for quantitative finance tasks compared to Qlib
  • May require more setup and configuration for finance-specific use cases
  • Lacks built-in financial data handling and preprocessing capabilities

Code Comparison

Optuna example:

import optuna

def objective(trial):
    x = trial.suggest_float('x', -10, 10)
    return (x - 2) ** 2

study = optuna.create_study()
study.optimize(objective, n_trials=100)

Qlib example:

from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.data.handler import Alpha158
from qlib.workflow import R

model = LGBModel()
dataset = Alpha158()
R.run(model=model, dataset=dataset)

The code examples highlight Optuna's focus on hyperparameter optimization across various domains, while Qlib provides a more streamlined approach for quantitative investment tasks with built-in models and data handlers.

3,881

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

Pros of FLAML

  • Focuses on automated machine learning (AutoML) for a wide range of tasks, including classification, regression, and time series forecasting
  • Offers efficient hyperparameter optimization with a budget-aware approach
  • Provides easy integration with popular ML frameworks like scikit-learn and XGBoost

Cons of FLAML

  • Less specialized for quantitative finance applications compared to Qlib
  • May require additional customization for specific financial modeling tasks
  • Lacks built-in financial data handling and preprocessing features

Code Comparison

FLAML example:

from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
predictions = automl.predict(X_test)

Qlib example:

from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
preds = model.predict(dataset)
R.log_prediction(model_name, dataset, preds)

Summary

FLAML is a versatile AutoML library suitable for various machine learning tasks, while Qlib is specifically designed for quantitative investment. FLAML offers efficient hyperparameter tuning and easy integration with popular ML frameworks, but may require additional customization for financial applications. Qlib provides built-in features for financial data handling and analysis, making it more suitable for quantitative finance tasks out of the box.

Best Practices on Recommendation Systems

Pros of Recommenders

  • Broader focus on general recommendation systems, not limited to financial applications
  • More extensive documentation and tutorials for beginners
  • Larger community and more frequent updates

Cons of Recommenders

  • Less specialized for quantitative finance tasks
  • May require more customization for specific financial use cases
  • Potentially slower performance for large-scale financial data processing

Code Comparison

Qlib example (time series forecasting):

from qlib.contrib.model.pytorch_lstm import LSTMModel
from qlib.contrib.data.handler import Alpha158
model = LSTMModel()
dataset = Alpha158()

Recommenders example (collaborative filtering):

from recommenders.models.ncf.ncf_singlenode import NCF
from recommenders.datasets import movielens
model = NCF(n_users, n_items, model_type="NeuMF")
train, test = movielens.load_pandas_df()

Both repositories offer powerful tools for their respective domains, with Qlib focusing on quantitative investment and Recommenders covering a broader range of recommendation tasks. The choice between them depends on the specific use case and required level of specialization in financial applications.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Python Versions Platform PypI Versions Upload Python Package Github Actions Test Status Documentation Status License Join the chat at https://gitter.im/Microsoft/qlib

:newspaper: What's NEW!   :sparkling_heart:

Recent released features

Introducing RD_Agent: LLM-Based Autonomous Evolving Agents for Industrial Data-Driven R&D

We are excited to announce the release of RD-Agent📢, a powerful tool that supports automated factor mining and model optimization in quant investment R&D.

RD-Agent is now available on GitHub, and we welcome your star🌟!

To learn more, please visit our ♾️Demo page. Here, you will find demo videos in both English and Chinese to help you better understand the scenario and usage of RD-Agent.

We have prepared several demo videos for you:

ScenarioDemo video (English)Demo video (中文)
Quant Factor MiningLinkLink
Quant Factor Mining from reportsLinkLink
Quant Model OptimizationLinkLink

FeatureStatus
BPQP for End-to-end learning📈Coming soon!(Under review)
🔥LLM-driven Auto Quant Factory🔥🚀 Released in ♾️RD-Agent on Aug 8, 2024
KRNN and Sandwich models:chart_with_upwards_trend: Released on May 26, 2023
Release Qlib v0.9.0:octocat: Released on Dec 9, 2022
RL Learning Framework:hammer: :chart_with_upwards_trend: Released on Nov 10, 2022. #1332, #1322, #1316,#1299,#1263, #1244, #1169, #1125, #1076
HIST and IGMTF models:chart_with_upwards_trend: Released on Apr 10, 2022
Qlib notebook tutorial📖 Released on Apr 7, 2022
Ibovespa index data:rice: Released on Apr 6, 2022
Point-in-Time database:hammer: Released on Mar 10, 2022
Arctic Provider Backend & Orderbook data example:hammer: Released on Jan 17, 2022
Meta-Learning-based framework & DDG-DA:chart_with_upwards_trend: :hammer: Released on Jan 10, 2022
Planning-based portfolio optimization:hammer: Released on Dec 28, 2021
Release Qlib v0.8.0:octocat: Released on Dec 8, 2021
ADD model:chart_with_upwards_trend: Released on Nov 22, 2021
ADARNN model:chart_with_upwards_trend: Released on Nov 14, 2021
TCN model:chart_with_upwards_trend: Released on Nov 4, 2021
Nested Decision Framework:hammer: Released on Oct 1, 2021. Example and Doc
Temporal Routing Adaptor (TRA):chart_with_upwards_trend: Released on July 30, 2021
Transformer & Localformer:chart_with_upwards_trend: Released on July 22, 2021
Release Qlib v0.7.0:octocat: Released on July 12, 2021
TCTS Model:chart_with_upwards_trend: Released on July 1, 2021
Online serving and automatic model rolling:hammer: Released on May 17, 2021
DoubleEnsemble Model:chart_with_upwards_trend: Released on Mar 2, 2021
High-frequency data processing example:hammer: Released on Feb 5, 2021
High-frequency trading example:chart_with_upwards_trend: Part of code released on Jan 28, 2021
High-frequency data(1min):rice: Released on Jan 27, 2021
Tabnet Model:chart_with_upwards_trend: Released on Jan 22, 2021

Features released before 2021 are not listed here.

Qlib is an open-source, AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning.

An increasing number of SOTA Quant research works/papers in diverse paradigms are being released in Qlib to collaboratively solve key challenges in quantitative investment. For example, 1) using supervised learning to mine the market's complex non-linear patterns from rich and heterogeneous financial data, 2) modeling the dynamic nature of the financial market using adaptive concept drift technology, and 3) using reinforcement learning to model continuous investment decisions and assist investors in optimizing their trading strategies.

It contains the full ML pipeline of data processing, model training, back-testing; and covers the entire chain of quantitative investment: alpha seeking, risk modeling, portfolio optimization, and order execution. For more details, please refer to our paper "Qlib: An AI-oriented Quantitative Investment Platform".

Frameworks, Tutorial, Data & DevOps Main Challenges & Solutions in Quant Research
  • Plans
  • Framework of Qlib
  • Quick Start
  • Quant Dataset Zoo
  • Learning Framework
  • More About Qlib
  • Offline Mode and Online Mode
  • Related Reports
  • Contact Us
  • Contributing
  • Main Challenges & Solutions in Quant Research
  • Plans

    New features under development(order by estimated release time). Your feedbacks about the features are very important.

    Framework of Qlib

    The high-level framework of Qlib can be found above(users can find the detailed framework of Qlib's design when getting into nitty gritty). The components are designed as loose-coupled modules, and each component could be used stand-alone.

    Qlib provides a strong infrastructure to support Quant research. Data is always an important part. A strong learning framework is designed to support diverse learning paradigms (e.g. reinforcement learning, supervised learning) and patterns at different levels(e.g. market dynamic modeling). By modeling the market, trading strategies will generate trade decisions that will be executed. Multiple trading strategies and executors in different levels or granularities can be nested to be optimized and run together. At last, a comprehensive analysis will be provided and the model can be served online in a low cost.

    Quick Start

    This quick start guide tries to demonstrate

    1. It's very easy to build a complete Quant research workflow and try your ideas with Qlib.
    2. Though with public data and simple models, machine learning technologies work very well in practical Quant investment.

    Here is a quick demo shows how to install Qlib, and run LightGBM with qrun. But, please make sure you have already prepared the data following the instruction.

    Installation

    This table demonstrates the supported Python version of Qlib:

    install with pipinstall from sourceplot
    Python 3.7:heavy_check_mark::heavy_check_mark::heavy_check_mark:
    Python 3.8:heavy_check_mark::heavy_check_mark::heavy_check_mark:
    Python 3.9:x::heavy_check_mark::x:

    Note:

    1. Conda is suggested for managing your Python environment. In some cases, using Python outside of a conda environment may result in missing header files, causing the installation failure of certain packages.
    2. Please pay attention that installing cython in Python 3.6 will raise some error when installing Qlib from source. If users use Python 3.6 on their machines, it is recommended to upgrade Python to version 3.7 or use conda's Python to install Qlib from source.
    3. For Python 3.9, Qlib supports running workflows such as training models, doing backtest and plot most of the related figures (those included in notebook). However, plotting for the model performance is not supported for now and we will fix this when the dependent packages are upgraded in the future.
    4. QlibRequires tables package, hdf5 in tables does not support python3.9.

    Install with pip

    Users can easily install Qlib by pip according to the following command.

      pip install pyqlib
    

    Note: pip will install the latest stable qlib. However, the main branch of qlib is in active development. If you want to test the latest scripts or functions in the main branch. Please install qlib with the methods below.

    Install from source

    Also, users can install the latest dev version Qlib by the source code according to the following steps:

    • Before installing Qlib from source, users need to install some dependencies:

      pip install numpy
      pip install --upgrade  cython
      
    • Clone the repository and install Qlib as follows.

      git clone https://github.com/microsoft/qlib.git && cd qlib
      pip install .  # `pip install -e .[dev]` is recommended for development. check details in docs/developer/code_standard_and_dev_guide.rst
      

      Note: You can install Qlib with python setup.py install as well. But it is not the recommended approach. It will skip pip and cause obscure problems. For example, only the command pip install . can overwrite the stable version installed by pip install pyqlib, while the command python setup.py install can't.

    Tips: If you fail to install Qlib or run the examples in your environment, comparing your steps and the CI workflow may help you find the problem.

    Tips for Mac: If you are using Mac with M1, you might encounter issues in building the wheel for LightGBM, which is due to missing dependencies from OpenMP. To solve the problem, install openmp first with brew install libomp and then run pip install . to build it successfully.

    Data Preparation

    ❗ Due to more restrict data security policy. The offical dataset is disabled temporarily. You can try this data source contributed by the community. Here is an example to download the data updated on 20240809.

    wget https://github.com/chenditc/investment_data/releases/download/2024-08-09/qlib_bin.tar.gz
    mkdir -p ~/.qlib/qlib_data/cn_data
    tar -zxvf qlib_bin.tar.gz -C ~/.qlib/qlib_data/cn_data --strip-components=1
    rm -f qlib_bin.tar.gz
    

    The official dataset below will resume in short future.


    Load and prepare data by running the following code:

    Get with module

    # get 1d data
    python -m qlib.run.get_data qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
    
    # get 1min data
    python -m qlib.run.get_data qlib_data --target_dir ~/.qlib/qlib_data/cn_data_1min --region cn --interval 1min
    
    

    Get from source

    # get 1d data
    python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
    
    # get 1min data
    python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data_1min --region cn --interval 1min
    
    

    This dataset is created by public data collected by crawler scripts, which have been released in the same repository. Users could create the same dataset with it. Description of dataset

    Please pay ATTENTION that the data is collected from Yahoo Finance, and the data might not be perfect. We recommend users to prepare their own data if they have a high-quality dataset. For more information, users can refer to the related document.

    Automatic update of daily frequency data (from yahoo finance)

    This step is Optional if users only want to try their models and strategies on history data.

    It is recommended that users update the data manually once (--trading_date 2021-05-25) and then set it to update automatically.

    NOTE: Users can't incrementally update data based on the offline data provided by Qlib(some fields are removed to reduce the data size). Users should use yahoo collector to download Yahoo data from scratch and then incrementally update it.

    For more information, please refer to: yahoo collector

    • Automatic update of data to the "qlib" directory each trading day(Linux)

      • use crontab: crontab -e

      • set up timed tasks:

        * * * * 1-5 python <script path> update_data_to_bin --qlib_data_1d_dir <user data dir>
        
        • script path: scripts/data_collector/yahoo/collector.py
    • Manual update of data

      python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir <user data dir> --trading_date <start date> --end_date <end date>
      
      • trading_date: start of trading day
      • end_date: end of trading day(not included)

    Docker images

    1. Pulling a docker image from a docker hub repository
      docker pull pyqlib/qlib_image_stable:stable
      
    2. Start a new Docker container
      docker run -it --name <container name> -v <Mounted local directory>:/app qlib_image_stable
      
    3. At this point you are in the docker environment and can run the qlib scripts. An example:
      >>> python scripts/get_data.py qlib_data --name qlib_data_simple --target_dir ~/.qlib/qlib_data/cn_data --interval 1d --region cn
      >>> python qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml
      
    4. Exit the container
      >>> exit
      
    5. Restart the container
      docker start -i -a <container name>
      
    6. Stop the container
      docker stop <container name>
      
    7. Delete the container
      docker rm <container name>
      
    8. If you want to know more information, please refer to the documentation.

    Auto Quant Research Workflow

    Qlib provides a tool named qrun to run the whole workflow automatically (including building dataset, training models, backtest and evaluation). You can start an auto quant research workflow and have a graphical reports analysis according to the following steps:

    1. Quant Research Workflow: Run qrun with lightgbm workflow config (workflow_config_lightgbm_Alpha158.yaml as following.

        cd examples  # Avoid running program under the directory contains `qlib`
        qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml
      

      If users want to use qrun under debug mode, please use the following command:

      python -m pdb qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml
      

      The result of qrun is as follows, please refer to Intraday Trading for more details about the result.

      
      'The following are analysis results of the excess return without cost.'
                             risk
      mean               0.000708
      std                0.005626
      annualized_return  0.178316
      information_ratio  1.996555
      max_drawdown      -0.081806
      'The following are analysis results of the excess return with cost.'
                             risk
      mean               0.000512
      std                0.005626
      annualized_return  0.128982
      information_ratio  1.444287
      max_drawdown      -0.091078
      

      Here are detailed documents for qrun and workflow.

    2. Graphical Reports Analysis: First, run python -m pip install .[analysis] to install the required dependencies. Then run examples/workflow_by_code.ipynb with jupyter notebook to get graphical reports.

      • Forecasting signal (model prediction) analysis

        • Cumulative Return of groups Cumulative Return
        • Return distribution long_short
        • Information Coefficient (IC) Information Coefficient Monthly IC IC
        • Auto Correlation of forecasting signal (model prediction) Auto Correlation
      • Portfolio analysis

        • Backtest return Report
      • Explanation of above results

    Building Customized Quant Research Workflow by Code

    The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. Here is a demo for customized Quant research workflow by code.

    Main Challenges & Solutions in Quant Research

    Quant investment is a very unique scenario with lots of key challenges to be solved. Currently, Qlib provides some solutions for several of them.

    Forecasting: Finding Valuable Signals/Patterns

    Accurate forecasting of the stock price trend is a very important part to construct profitable portfolios. However, huge amount of data with various formats in the financial market which make it challenging to build forecasting models.

    An increasing number of SOTA Quant research works/papers, which focus on building forecasting models to mine valuable signals/patterns in complex financial data, are released in Qlib

    Quant Model (Paper) Zoo

    Here is a list of models built on Qlib.

    Your PR of new Quant models is highly welcomed.

    The performance of each model on the Alpha158 and Alpha360 datasets can be found here.

    Run a single model

    All the models listed above are runnable with Qlib. Users can find the config files we provide and some details about the model through the benchmarks folder. More information can be retrieved at the model files listed above.

    Qlib provides three different ways to run a single model, users can pick the one that fits their cases best:

    • Users can use the tool qrun mentioned above to run a model's workflow based from a config file.

    • Users can create a workflow_by_code python script based on the one listed in the examples folder.

    • Users can use the script run_all_model.py listed in the examples folder to run a model. Here is an example of the specific shell command to be used: python run_all_model.py run --models=lightgbm, where the --models arguments can take any number of models listed above(the available models can be found in benchmarks). For more use cases, please refer to the file's docstrings.

      • NOTE: Each baseline has different environment dependencies, please make sure that your python version aligns with the requirements(e.g. TFT only supports Python 3.6~3.7 due to the limitation of tensorflow==1.15.0)

    Run multiple models

    Qlib also provides a script run_all_model.py which can run multiple models for several iterations. (Note: the script only support Linux for now. Other OS will be supported in the future. Besides, it doesn't support parallel running the same model for multiple times as well, and this will be fixed in the future development too.)

    The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as IC and backtest results will be generated and stored.

    Here is an example of running all the models for 10 iterations:

    python run_all_model.py run 10
    

    It also provides the API to run specific models at once. For more use cases, please refer to the file's docstrings.

    Adapting to Market Dynamics

    Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data. So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.

    Here is a list of solutions built on Qlib.

    Reinforcement Learning: modeling continuous decisions

    Qlib now supports reinforcement learning, a feature designed to model continuous investment decisions. This functionality assists investors in optimizing their trading strategies by learning from interactions with the environment to maximize some notion of cumulative reward.

    Here is a list of solutions built on Qlib categorized by scenarios.

    RL for order execution

    Here is the introduction of this scenario. All the methods below are compared here.

    Quant Dataset Zoo

    Dataset plays a very important role in Quant. Here is a list of the datasets built on Qlib:

    DatasetUS MarketChina Market
    Alpha360√√
    Alpha158√√

    Here is a tutorial to build dataset with Qlib. Your PR to build new Quant dataset is highly welcomed.

    Learning Framework

    Qlib is high customizable and a lot of its components are learnable. The learnable components are instances of Forecast Model and Trading Agent. They are learned based on the Learning Framework layer and then applied to multiple scenarios in Workflow layer. The learning framework leverages the Workflow layer as well(e.g. sharing Information Extractor, creating environments based on Execution Env).

    Based on learning paradigms, they can be categorized into reinforcement learning and supervised learning.

    • For supervised learning, the detailed docs can be found here.
    • For reinforcement learning, the detailed docs can be found here. Qlib's RL learning framework leverages Execution Env in Workflow layer to create environments. It's worth noting that NestedExecutor is supported as well. This empowers users to optimize different level of strategies/models/agents together (e.g. optimizing an order execution strategy for a specific portfolio management strategy).

    More About Qlib

    If you want to have a quick glance at the most frequently used components of qlib, you can try notebooks here.

    The detailed documents are organized in docs. Sphinx and the readthedocs theme is required to build the documentation in html formats.

    cd docs/
    conda install sphinx sphinx_rtd_theme -y
    # Otherwise, you can install them with pip
    # pip install sphinx sphinx_rtd_theme
    make html
    

    You can also view the latest document online directly.

    Qlib is in active and continuing development. Our plan is in the roadmap, which is managed as a github project.

    Offline Mode and Online Mode

    The data server of Qlib can either deployed as Offline mode or Online mode. The default mode is offline mode.

    Under Offline mode, the data will be deployed locally.

    Under Online mode, the data will be deployed as a shared data service. The data and their cache will be shared by all the clients. The data retrieval performance is expected to be improved due to a higher rate of cache hits. It will consume less disk space, too. The documents of the online mode can be found in Qlib-Server. The online mode can be deployed automatically with Azure CLI based scripts. The source code of online data server can be found in Qlib-Server repository.

    Performance of Qlib Data Server

    The performance of data processing is important to data-driven methods like AI technologies. As an AI-oriented platform, Qlib provides a solution for data storage and data processing. To demonstrate the performance of Qlib data server, we compare it with several other data storage solutions.

    We evaluate the performance of several storage solutions by finishing the same task, which creates a dataset (14 features/factors) from the basic OHLCV daily data of a stock market (800 stocks each day from 2007 to 2020). The task involves data queries and processing.

    HDF5MySQLMongoDBInfluxDBQlib -E -DQlib +E -DQlib +E +D
    Total (1CPU) (seconds)184.4±3.7365.3±7.5253.6±6.7368.2±3.6147.0±8.847.6±1.07.4±0.3
    Total (64CPU) (seconds)8.8±0.64.2±0.2
    • +(-)E indicates with (out) ExpressionCache
    • +(-)D indicates with (out) DatasetCache

    Most general-purpose databases take too much time to load data. After looking into the underlying implementation, we find that data go through too many layers of interfaces and unnecessary format transformations in general-purpose database solutions. Such overheads greatly slow down the data loading process. Qlib data are stored in a compact format, which is efficient to be combined into arrays for scientific computation.

    Related Reports

    Contact Us

    • If you have any issues, please create issue here or send messages in gitter.
    • If you want to make contributions to Qlib, please create pull requests.
    • For other reasons, you are welcome to contact us by email(qlib@microsoft.com).
      • We are recruiting new members(both FTEs and interns), your resumes are welcome!

    Join IM discussion groups:

    Gitter
    image

    Contributing

    We appreciate all contributions and thank all the contributors!

    Before we released Qlib as an open-source project on Github in Sep 2020, Qlib is an internal project in our group. Unfortunately, the internal commit history is not kept. A lot of members in our group have also contributed a lot to Qlib, which includes Ruihua Wang, Yinda Zhang, Haisu Yu, Shuyu Wang, Bochen Pang, and Dong Zhou. Especially thanks to Dong Zhou due to his initial version of Qlib.

    Guidance

    This project welcomes contributions and suggestions.
    Here are some code standards and development guidance for submiting a pull request.

    Making contributions is not a hard thing. Solving an issue(maybe just answering a question raised in issues list or gitter), fixing/issuing a bug, improving the documents and even fixing a typo are important contributions to Qlib.

    For example, if you want to contribute to Qlib's document/code, you can follow the steps in the figure below.

    If you don't know how to start to contribute, you can refer to the following examples.

    TypeExamples
    Solving issuesAnswer a question; issuing or fixing a bug
    DocsImprove docs quality ; Fix a typo
    FeatureImplement a requested feature like this; Refactor interfaces
    DatasetAdd a dataset
    ModelsImplement a new model, some instructions to contribute models

    Good first issues are labelled to indicate that they are easy to start your contributions.

    You can find some impefect implementation in Qlib by rg 'TODO|FIXME' qlib

    If you would like to become one of Qlib's maintainers to contribute more (e.g. help merge PR, triage issues), please contact us by email(qlib@microsoft.com). We are glad to help to upgrade your permission.

    Licence

    Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the right to use your contribution. For details, visit https://cla.opensource.microsoft.com.

    When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

    This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.