CogDL

CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)

1,784

310

1,784

View on GitHub

Top Related Projects

dgl

13,929

Python package built to ease deep learning on graph, on top of existing DL frameworks.

pytorch_geometric

22,528

Graph Neural Network Library for PyTorch

ogb

2,011

Benchmark datasets, data loaders, and evaluators for graph machine learning

PyTorch-BigGraph

3,413

Generate embeddings from large-scale graph-structured data.

Quick Overview

CogDL is an extensive and efficient graph representation learning toolkit for researchers and developers. It provides a unified framework for implementing and evaluating various graph representation learning methods, including node classification, link prediction, and graph classification tasks.

Pros

Comprehensive collection of graph learning models and datasets
Easy-to-use API for quick experimentation and benchmarking
Efficient implementation with GPU acceleration
Extensible architecture for adding custom models and tasks

Cons

Steep learning curve for users unfamiliar with graph representation learning
Documentation could be more detailed and user-friendly
Limited support for dynamic graphs and temporal data
Some advanced features may require in-depth knowledge of the underlying algorithms

Code Examples

Node classification using GCN:

from cogdl import experiment

experiment(task="node_classification", dataset="cora", model="gcn")

Link prediction using DeepWalk:

from cogdl import experiment

experiment(task="link_prediction", dataset="ppi", model="deepwalk")

Graph classification using GraphSAGE:

from cogdl import experiment

experiment(task="graph_classification", dataset="mutag", model="graphsage")

Custom model implementation:

from cogdl.models import BaseModel
from cogdl.layers import GCNLayer

class CustomGCN(BaseModel):
    def __init__(self, in_feats, hidden_size, out_feats):
        super(CustomGCN, self).__init__()
        self.conv1 = GCNLayer(in_feats, hidden_size)
        self.conv2 = GCNLayer(hidden_size, out_feats)

    def forward(self, graph, x):
        x = self.conv1(graph, x)
        x = self.conv2(graph, x)
        return x

Getting Started

To get started with CogDL, follow these steps:

Install CogDL:

pip install cogdl

Run a simple experiment:

from cogdl import experiment

# Node classification on Cora dataset using GCN
result = experiment(task="node_classification", dataset="cora", model="gcn")
print(result)

For more advanced usage, refer to the CogDL documentation and examples in the GitHub repository.

Competitor Comparisons

dgl

13,929

Python package built to ease deep learning on graph, on top of existing DL frameworks.

Pros of DGL

More extensive and mature ecosystem with better documentation
Supports multiple deep learning frameworks (PyTorch, MXNet, TensorFlow)
Higher performance and scalability for large-scale graph datasets

Cons of DGL

Steeper learning curve due to more complex API
Heavier dependency requirements
Less focus on ease of use for quick prototyping

Code Comparison

CogDL:

import torch
from cogdl import experiment

experiment(task="node_classification", dataset="cora", model="gcn")

DGL:

import dgl
import torch.nn as nn
import torch.nn.functional as F

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = dgl.nn.GraphConv(in_feats, h_feats)
        self.conv2 = dgl.nn.GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = F.relu(self.conv1(g, in_feat))
        h = self.conv2(g, h)
        return h

pytorch_geometric

22,528

Graph Neural Network Library for PyTorch

Pros of PyTorch Geometric

More extensive and diverse set of graph neural network models and layers
Larger and more active community, leading to better support and more frequent updates
Better integration with PyTorch ecosystem and seamless GPU acceleration

Cons of PyTorch Geometric

Steeper learning curve for beginners due to its extensive feature set
Can be more resource-intensive for large-scale graph processing

Code Comparison

CogDL:

import torch
from cogdl import experiment

experiment(task="node_classification", dataset="cora", model="gcn")

PyTorch Geometric:

import torch
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid

dataset = Planetoid(root='/tmp/Cora', name='Cora')
model = GCNConv(dataset.num_features, dataset.num_classes)

Both libraries offer high-level APIs for graph-based machine learning tasks, but PyTorch Geometric provides more flexibility and control over model architecture and training process. CogDL focuses on ease of use and quick experimentation, while PyTorch Geometric offers a more comprehensive toolkit for advanced graph neural network development.

graph_nets

5,384

Build Graph Nets in Tensorflow

Pros of graph_nets

More focused on graph neural networks and deep learning for graphs
Better integration with TensorFlow and other Google AI tools
More extensive documentation and examples for various graph-based tasks

Cons of graph_nets

Less comprehensive in terms of graph algorithms and traditional network analysis
More complex setup and usage, especially for those not familiar with TensorFlow
Limited support for non-deep learning graph tasks

Code Comparison

graph_nets:

import graph_nets as gn
import tensorflow as tf

graph = gn.graphs.GraphsTuple(...)
model = gn.modules.GraphNetwork(...)
output = model(graph)

CogDL:

from cogdl import experiment

experiment(task="node_classification", dataset="cora", model="gcn")

The graph_nets code shows a more low-level approach, allowing for custom graph construction and model definition. CogDL, on the other hand, provides a higher-level API for running experiments with predefined tasks, datasets, and models.

ogb

2,011

Benchmark datasets, data loaders, and evaluators for graph machine learning

Pros of OGB

Focuses on standardized benchmark datasets and evaluation protocols for graph machine learning
Provides a wide range of graph datasets across various domains and tasks
Offers easy-to-use data loaders and evaluators for consistent benchmarking

Cons of OGB

Limited in terms of implemented graph learning models and algorithms
Primarily designed for benchmarking rather than providing a comprehensive graph learning toolkit
May require additional libraries for model implementation and training

Code Comparison

OGB:

from ogb.nodeproppred import NodePropPredDataset

dataset = NodePropPredDataset(name="ogbn-arxiv")
graph, label = dataset[0]

CogDL:

from cogdl import experiment

experiment(task="node_classification", dataset="cora", model="gcn")

Summary

OGB excels in providing standardized graph datasets and evaluation metrics, making it ideal for benchmarking graph learning models. CogDL, on the other hand, offers a more comprehensive toolkit for graph representation learning, including various models and algorithms. While OGB focuses on data handling and evaluation, CogDL provides an end-to-end solution for graph learning tasks.

PyTorch-BigGraph

3,413

Generate embeddings from large-scale graph-structured data.

Pros of PyTorch-BigGraph

Designed specifically for large-scale graph embeddings, handling billions of nodes and edges efficiently
Supports multi-entity and multi-relation graphs, making it versatile for complex network structures
Offers distributed training capabilities, enabling faster processing on multiple machines

Cons of PyTorch-BigGraph

Focused primarily on graph embeddings, lacking broader graph analysis tools
Steeper learning curve due to its specialized nature and distributed computing features
Less extensive documentation and examples compared to CogDL

Code Comparison

PyTorch-BigGraph:

from torchbiggraph.config import parse_config
from torchbiggraph.train import train
from torchbiggraph.util import SubprocessInitializer

config = parse_config(config_dict)
train(config, rank=0, subprocess_init=SubprocessInitializer())

CogDL:

from cogdl import experiment

experiment(task="node_classification", dataset="cora", model="gcn")

PyTorch-BigGraph focuses on configuring and training large-scale graph embeddings, while CogDL provides a more straightforward API for various graph-related tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CogDL is a graph deep learning toolkit that allows researchers and developers to easily train and compare baseline or customized models for node classification, graph classification, and other important tasks in the graph domain.

We summarize the contributions of CogDL as follows:

Efficiency: CogDL utilizes well-optimized operators to speed up training and save GPU memory of GNN models.
Ease of Use: CogDL provides easy-to-use APIs for running experiments with the given models and datasets using hyper-parameter search.
Extensibility: The design of CogDL makes it easy to apply GNN models to new scenarios based on our framework.

â News

The CogDL paper was accepted by WWW 2023. Find us at WWW 2023! We also release the new v0.6 release which adds more examples of graph self-supervised learning, including GraphMAE, GraphMAE2, and BGRL.
A free GNN course provided by CogDL Team is present at this link. We also provide a discussion forum for Chinese users.
The new v0.5.3 release supports mixed-precision training by setting \textit{fp16=True} and provides a basic example written by Jittor. It also updates the tutorial in the document, fixes downloading links of some datasets, and fixes potential bugs of operators.

News History

The new v0.5.2 release adds a GNN example for ogbn-products and updates geom datasets. It also fixes some potential bugs including setting devices, using cpu for inference, etc.
The new v0.5.1 release adds fast operators including SpMM (cpu version) and scatter_max (cuda version). It also adds lots of datasets for node classification which can be found in this link. ð
The new v0.5.0 release designs and implements a unified training loop for GNN. It introduces DataWrapper to help prepare the training/validation/test data and ModelWrapper to define the training/validation/test steps. ð
The new v0.4.1 release adds the implementation of Deep GNNs and the recommendation task. It also supports new pipelines for generating embeddings and recommendation. Welcome to join our tutorial on KDD 2021 at 10:30 am - 12:00 am, Aug. 14th (Singapore Time). More details can be found in https://kdd2021graph.github.io/. ð
The new v0.4.0 release refactors the data storage (from Data to Graph) and provides more fast operators to speed up GNN training. It also includes many self-supervised learning methods on graphs. BTW, we are glad to announce that we will give a tutorial on KDD 2021 in August. Please see this link for more details. ð
CogDL supports GNN models with Mixture of Experts (MoE). You can install FastMoE and try MoE GCN in CogDL now!
The new v0.3.0 release provides a fast spmm operator to speed up GNN training. We also release the first version of CogDL paper in arXiv. You can join our slack for discussion. ððð
The new v0.2.0 release includes easy-to-use experiment and pipeline APIs for all experiments and applications. The experiment API supports automl features of searching hyper-parameters. This release also provides OAGBert API for model inference (OAGBert is trained on large-scale academic corpus by our lab). Some features and models are added by the open source community (thanks to all the contributors ð).
The new v0.1.2 release includes a pre-training task, many examples, OGB datasets, some knowledge graph embedding methods, and some graph neural network models. The coverage of CogDL is increased to 80%. Some new APIs, such as Trainer and Sampler, are developed and being tested.
The new v0.1.1 release includes the knowledge link prediction task, many state-of-the-art models, and optuna support. We also have a Chinese WeChat post about the CogDL release.

Getting Started

Requirements and Installation

Python version >= 3.7
PyTorch version >= 1.7.1

Please follow the instructions here to install PyTorch (https://github.com/pytorch/pytorch#installation).

When PyTorch has been installed, cogdl can be installed using pip as follows:

pip install cogdl

Install from source via:

pip install git+https://github.com/thudm/cogdl.git

Or clone the repository and install with the following commands:

git clone git@github.com:THUDM/cogdl.git
cd cogdl
pip install -e .

Usage

API Usage

You can run all kinds of experiments through CogDL APIs, especially experiment. You can also use your own datasets and models for experiments. A quickstart example can be found in the quick_start.py. More examples are provided in the examples/.

from cogdl import experiment

# basic usage
experiment(dataset="cora", model="gcn")

# set other hyper-parameters
experiment(dataset="cora", model="gcn", hidden_size=32, epochs=200)

# run over multiple models on different seeds
experiment(dataset="cora", model=["gcn", "gat"], seed=[1, 2])

# automl usage
def search_space(trial):
    return {
        "lr": trial.suggest_categorical("lr", [1e-3, 5e-3, 1e-2]),
        "hidden_size": trial.suggest_categorical("hidden_size", [32, 64, 128]),
        "dropout": trial.suggest_uniform("dropout", 0.5, 0.8),
    }

experiment(dataset="cora", model="gcn", seed=[1, 2], search_space=search_space)

Command-Line Usage

You can also use python scripts/train.py --dataset example_dataset --model example_model to run example_model on example_data.

--dataset, dataset name to run, can be a list of datasets with space like cora citeseer. Supported datasets include 'cora', 'citeseer', 'pumbed', 'ppi', 'wikipedia', 'blogcatalog', 'flickr'. More datasets can be found in the cogdl/datasets.
--model, model name to run, can be a list of models like gcn gat. Supported models include 'gcn', 'gat', 'graphsage', 'deepwalk', 'node2vec', 'hope', 'grarep', 'netmf', 'netsmf', 'prone'. More models can be found in the cogdl/models.

For example, if you want to run GCN and GAT on the Cora dataset, with 5 different seeds:

python scripts/train.py --dataset cora --model gcn gat --seed 0 1 2 3 4

Expected output:

Variant	test_acc	val_acc
('cora', 'gcn')	0.8050Â±0.0047	0.7940Â±0.0063
('cora', 'gat')	0.8234Â±0.0042	0.8088Â±0.0016

If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.

â FAQ

How to contribute to CogDL?

If you have a well-performed algorithm and are willing to implement it in our toolkit to help more people, you can first open an issue and then create a pull request, detailed information can be found here.

Before committing your modification, please first run pre-commit install to setup the git hook for checking code format and style using black and flake8. Then the pre-commit will run automatically on git commit! Detailed information of pre-commit can be found here.

How to enable fast GNN training?

CogDL provides a fast sparse matrix-matrix multiplication operator called [GE-SpMM](https://arxiv.org/abs/2007.03179) to speed up training of GNN models on the GPU. The feature will be automatically used if it is available. Note that this feature is still in testing and may not work under some versions of CUDA.

How to run parallel experiments with GPUs on several models?

If you want to run parallel experiments on your server with multiple GPUs on multiple models, GCN and GAT, on the Cora dataset:

$ python scripts/train.py --dataset cora --model gcn gat --hidden-size 64 --devices 0 1 --seed 0 1 2 3 4

Expected output:

Variant	Acc
('cora', 'gcn')	0.8236Â±0.0033
('cora', 'gat')	0.8262Â±0.0032

How to use models from other libraries?

If you are familiar with other popular graph libraries, you can implement your own model in CogDL using modules from PyTorch Geometric (PyG). For the installation of PyG, you can follow the instructions from PyG (https://github.com/rusty1s/pytorch_geometric/#installation). For the quick-start usage of how to use layers of PyG, you can find some examples in the [examples/pyg](https://github.com/THUDM/cogdl/tree/master/examples/pyg/).

How to make a successful pull request with unit test

To have a successful pull request, you need to have at least (1) your model implementation and (2) a unit test.

You might be confused why your pull request was rejected because of 'Coverage decreased ...' issue even though your model is working fine locally. This is because you have not included a unit test, which essentially runs through the extra lines of code you added. The Travis CI service used by Github conducts all unit tests on the code you committed and checks how many lines of the code have been checked by the unit tests, and if a significant portion of your code has not been checked (insufficient coverage), the pull request is rejected.

So how do you do a unit test?

Let's say you implement a GNN model in a script models/nn/abcgnn.py that does the task of node classification. Then, you need to add a unit test inside the script tests/tasks/test_node_classification.py (or whatever relevant task your model does).
To add the unit test, you simply add a function test_abcgnn_cora() (just follow the format of the other unit tests already in the script), fill it with required arguments and the last line in the function 'assert 0 <= ret["Acc"] <= 1' is the very basic sanity check conducted by the unit test.
After modifying tests/tasks/test_node_classification.py, commit it together with your models/nn/abcgnn.py and your pull request should pass.

CogDL Team

CogDL is developed and maintained by Tsinghua, ZJU, DAMO Academy, and ZHIPU.AI.

The core development team can be reached at cogdlteam@gmail.com.

Citing CogDL

Please cite our paper if you find our code or results useful for your research:

@inproceedings{cen2023cogdl,
    title={CogDL: A Comprehensive Library for Graph Deep Learning},
    author={Yukuo Cen and Zhenyu Hou and Yan Wang and Qibin Chen and Yizhen Luo and Zhongming Yu and Hengrui Zhang and Xingcheng Yao and Aohan Zeng and Shiguang Guo and Yuxiao Dong and Yang Yang and Peng Zhang and Guohao Dai and Yu Wang and Chang Zhou and Hongxia Yang and Jie Tang},
    booktitle={Proceedings of the ACM Web Conference 2023 (WWW'23)},
    year={2023}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of DGL

Cons of DGL

Code Comparison

Pros of PyTorch Geometric

Cons of PyTorch Geometric

Code Comparison

Pros of graph_nets

Cons of graph_nets

Code Comparison

Pros of OGB

Cons of OGB

Code Comparison

Summary

Pros of PyTorch-BigGraph

Cons of PyTorch-BigGraph

Code Comparison

Convert designs to code with AI

README

â News

Getting Started

Requirements and Installation

Usage

API Usage

Command-Line Usage

â FAQ

CogDL Team

Citing CogDL

Top Related Projects

Convert designs to code with AI

â News

â FAQ