DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

3,998

723

3,998

View on GitHub

Top Related Projects

OpenNRE

4,412

An Open-Source Package for Neural Relation Extraction (NRE)

PaddleNLP

12,655

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

dgl

13,929

Python package built to ease deep learning on graph, on top of existing DL frameworks.

PyTorch-BigGraph

3,413

Generate embeddings from large-scale graph-structured data.

OpenNMT-py

6,923

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Quick Overview

DeepKE is an open-source knowledge extraction toolkit supporting low-resource, document-level, and multimodal scenarios. It provides a unified framework for various knowledge extraction tasks, including named entity recognition, relation extraction, and attribute extraction. DeepKE aims to make knowledge extraction more accessible and efficient for researchers and practitioners.

Pros

Supports multiple knowledge extraction tasks in a unified framework
Offers pre-trained models and easy-to-use interfaces for quick deployment
Provides support for low-resource scenarios, making it useful for languages or domains with limited data
Includes multimodal capabilities, allowing for knowledge extraction from text and images

Cons

May have a steeper learning curve for users unfamiliar with knowledge extraction concepts
Documentation could be more comprehensive, especially for advanced use cases
Limited support for languages other than English and Chinese
Performance may vary depending on the specific task and dataset

Code Examples

Named Entity Recognition (NER):

from deepke.name_entity_re import *

# Load pre-trained NER model
model = NERModel("bert", "bert-base-chinese", labels=["PER", "ORG", "LOC"])

# Perform NER on a given text
text = "张三在北京大学工作"
result = model.predict(text)
print(result)

Relation Extraction:

from deepke.relation_extraction import *

# Load pre-trained relation extraction model
model = REModel("bert", "bert-base-chinese")

# Extract relations from a sentence
sentence = "苹果公司的总部位于加利福尼亚州"
subject = "苹果公司"
object = "加利福尼亚州"
result = model.predict(sentence, subject, object)
print(result)

Attribute Extraction:

from deepke.attribute_extraction import *

# Load pre-trained attribute extraction model
model = AEModel("bert", "bert-base-chinese")

# Extract attributes from a given text
text = "这款手机的屏幕尺寸为6.1英寸，电池容量为3000mAh"
result = model.predict(text)
print(result)

Getting Started

To get started with DeepKE, follow these steps:

Install DeepKE:

pip install deepke

Import the desired module:

from deepke.name_entity_re import NERModel
from deepke.relation_extraction import REModel
from deepke.attribute_extraction import AEModel

Load a pre-trained model and use it for prediction:

model = NERModel("bert", "bert-base-chinese", labels=["PER", "ORG", "LOC"])
result = model.predict("张三在北京大学工作")
print(result)

For more detailed instructions and advanced usage, refer to the official DeepKE documentation.

Competitor Comparisons

OpenNRE

4,412

An Open-Source Package for Neural Relation Extraction (NRE)

Pros of OpenNRE

Focuses specifically on neural relation extraction, providing a more specialized toolkit
Offers pre-trained models for quick deployment and testing
Includes a comprehensive evaluation module for model performance analysis

Cons of OpenNRE

Limited to relation extraction tasks, while DeepKE covers a broader range of knowledge extraction tasks
Less flexibility in terms of customization and integration with other NLP tasks
Smaller community and fewer updates compared to DeepKE

Code Comparison

OpenNRE:

from opennre import encoder, model, framework

# Define the model
model = model.SoftmaxNN(
    ckpt='bert-base-uncased',
    encoder=encoder.BERTEncoder(max_length=80)
)

# Load data and train
framework.train_model(model, train_path='train.txt', val_path='val.txt')

DeepKE:

from deepke import extraction

# Initialize and train the model
extraction.set_config(task='relation_extraction', model='bert')
extraction.train()

# Predict using the trained model
extraction.predict(text="Apple Inc. was founded by Steve Jobs.")

PaddleNLP

12,655

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

Pros of PaddleNLP

Comprehensive NLP toolkit with a wide range of pre-trained models and datasets
Seamless integration with PaddlePaddle deep learning framework
Extensive documentation and tutorials for ease of use

Cons of PaddleNLP

Primarily focused on Chinese NLP tasks, which may limit its applicability for other languages
Steeper learning curve for users not familiar with PaddlePaddle ecosystem

Code Comparison

PaddleNLP:

from paddlenlp import Taskflow

ner = Taskflow("ner")
result = ner("华为是一家总部位于广东省深圳市的中国大型通信设备公司")
print(result)

DeepKE:

from deepke.name_entity_re import NER

ner = NER("bert")
result = ner.predict("华为是一家总部位于广东省深圳市的中国大型通信设备公司")
print(result)

Both repositories provide tools for natural language processing tasks, with a focus on named entity recognition in this example. PaddleNLP offers a more comprehensive toolkit within the PaddlePaddle ecosystem, while DeepKE provides a specialized framework for knowledge extraction tasks. The choice between them depends on the specific requirements of the project and familiarity with the respective ecosystems.

dgl

13,929

Python package built to ease deep learning on graph, on top of existing DL frameworks.

Pros of DGL

Broader scope: Focuses on general graph neural networks, applicable to various domains
More mature project with larger community and extensive documentation
Supports multiple deep learning frameworks (PyTorch, MXNet, TensorFlow)

Cons of DGL

Steeper learning curve due to its more general-purpose nature
May require more code for specific knowledge extraction tasks
Less specialized for knowledge graph and relation extraction tasks

Code Comparison

DeepKE example (relation extraction):

from deepke.name_entity_re.standard import *

model = NERModel('bert', 'bert-base-uncased', num_labels=9)
model.train_model(train_data)
predictions = model.predict(test_data)

DGL example (graph neural network):

import dgl
import torch.nn as nn

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = dgl.nn.GraphConv(in_feats, h_feats)
        self.conv2 = dgl.nn.GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = self.conv2(g, h)
        return h

PyTorch-BigGraph

3,413

Generate embeddings from large-scale graph-structured data.

Pros of PyTorch-BigGraph

Designed for large-scale graph embedding, capable of handling billions of nodes and edges
Supports distributed training across multiple machines for improved performance
Offers a variety of loss functions and edge sampling techniques

Cons of PyTorch-BigGraph

Focused primarily on graph embeddings, less versatile for other NLP tasks
Steeper learning curve due to its specialized nature and distributed computing features
Less active development and community support compared to DeepKE

Code Comparison

PyTorch-BigGraph:

config = torchbiggraph.config.parse_config({
    'entities': {'all': {'num_partitions': 1}},
    'relations': [{'name': 'all', 'lhs': 'all', 'rhs': 'all'}],
    'dimension': 100,
    'max_epochs': 50,
    'num_batch_negs': 1000,
    'num_uniform_negs': 1000,
})

DeepKE:

config = {
    "model_name": "bert-base-uncased",
    "max_seq_len": 128,
    "batch_size": 32,
    "learning_rate": 2e-5,
    "num_train_epochs": 3,
}
model = NERModel("bert", "bert-base-uncased", args=config)

OpenNMT-py

6,923

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Pros of OpenNMT-py

More mature and widely adopted project with extensive documentation
Supports a broader range of neural machine translation architectures
Active community and regular updates

Cons of OpenNMT-py

Focused primarily on machine translation, less versatile for other NLP tasks
Steeper learning curve for beginners in NLP

Code Comparison

OpenNMT-py:

import onmt

# Define model parameters
model_opts = {"model_type": "transformer", "src_vocab": src_vocab, "tgt_vocab": tgt_vocab}

# Create and train the model
model = onmt.models.build_model(model_opts)
trainer = onmt.Trainer(model, train_data, valid_data, optim)
trainer.train()

DeepKE:

from deepke import NERModel

# Define model parameters
model_params = {"model_type": "bert", "num_labels": num_labels}

# Create and train the model
model = NERModel("bert", "bert-base-cased", args=model_params)
model.train_model(train_data)

OpenNMT-py is more specialized for machine translation tasks, while DeepKE offers a broader range of NLP functionalities, including named entity recognition, relation extraction, and attribute extraction. DeepKE provides a simpler API for various NLP tasks, making it more accessible for users new to NLP. However, OpenNMT-py's focus on translation allows for more advanced and customizable translation models.

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Pros of AllenNLP

More comprehensive and general-purpose NLP toolkit
Larger community and more extensive documentation
Built on PyTorch, offering greater flexibility and ease of use

Cons of AllenNLP

Steeper learning curve for beginners
Less focused on specific knowledge extraction tasks
May require more setup and configuration for specialized use cases

Code Comparison

DeepKE example (entity extraction):

from deepke.name_entity_re.standard import *

model = NERModel('bert', 'bert-base-chinese', num_labels=len(label2id))
model.train_model(train_data)
predictions = model.predict(["我在北京大学学习"])

AllenNLP example (named entity recognition):

from allennlp.predictors import Predictor

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/ner-model-2020.02.10.tar.gz")
result = predictor.predict(sentence="The girl went to Harvard University.")

Both libraries offer streamlined APIs for NLP tasks, but DeepKE focuses more on knowledge extraction, while AllenNLP provides a broader range of NLP functionalities. DeepKE's API is more tailored for specific tasks, whereas AllenNLP's approach is more generalized and requires additional configuration for specialized use cases.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

English | ç®ä½ä¸æ

A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction

DeepKE is a knowledge extraction toolkit for knowledge graph construction supporting cnSchemaï¼low-resource, document-level and multimodal scenarios for entity, relation and attribute extraction. We provide documents, online demo, paper, slides and poster for beginners.

âWant to use Large Language Models with DeepKE? Try DeepKE-LLM and OneKE, have fun!
âWant to train supervised models? Try Quick Start, we provide the NER models (e.g, LightNER(COLING'22), W2NER(AAAI'22)), relation extraction models (e.g., KnowPrompt(WWW'22)), relational triple extraction models (e.g., ASP(EMNLP'22), PRGC(ACL'21), PURE(NAACL'21)), and release off-the-shelf models at DeepKE-cnSchema, have fun!
We recommend using Linux; if using Windows, please use \\ in file paths;
If HuggingFace is inaccessible, please consider using wisemodel or modescape.

If you encounter any issues during the installation of DeepKE and DeepKE-LLM, please check Tips or promptly submit an issue, and we will assist you with resolving the problem!

Table of Contents
What's New
Prediction Demo
Model Framework
Quick Start
Tips
To do
Reading Materials
Related Toolkit
Citation
Contributors
Other Knowledge Extraction Open-Source Projects

What's New

June, 2025 We integrate the MCP service tools into DeepKE, enabling knowledge extraction through large language models (LLMs) as tool callers for lightweight models.
December, 2024 We open source the OneKE knowledge extraction framework, supporting multi-agent knowledge extraction across various scenarios.
April, 2024 We release a new bilingual (Chinese and English) schema-based information extraction model called OneKE based on Chinese-Alpaca-2-13B.
Feb, 2024 We release a large-scale (0.32B tokens) high-quality bilingual (Chinese and English) Information Extraction (IE) instruction dataset named IEPile, along with two models trained with IEPile, baichuan2-13b-iepile-lora and llama2-13b-iepile-lora.
Sep 2023 a bilingual Chinese English Information Extraction (IE) instruction dataset called InstructIE was released for the Instruction based Knowledge Graph Construction Task (Instruction based KGC), as detailed in here.
June, 2023 We update DeepKE-LLM to support knowledge extraction with KnowLM, ChatGLM, LLaMA-series, GPT-series etc.
Apr, 2023 We have added new models, including CP-NER(IJCAI'23), ASP(EMNLP'22), PRGC(ACL'21), PURE(NAACL'21), provided event extraction capabilities (Chinese and English), and offered compatibility with higher versions of Python packages (e.g., Transformers).
Feb, 2023 We have supported using LLM (GPT-3) with in-context learning (based on EasyInstruct) & data generation, added a NER model W2NER(AAAI'22).

Previous News

Nov, 2022 Add data annotation instructions for entity recognition and relation extraction, automatic labelling of weakly supervised data (entity extraction and relation extraction), and optimize multi-GPU training.
Sept, 2022 The paper DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population has been accepted by the EMNLP 2022 System Demonstration Track.
Aug, 2022 We have added data augmentation (Chinese, English) support for low-resource relation extraction.
June, 2022 We have added multimodal support for entity and relation extraction.
May, 2022 We have released DeepKE-cnschema with off-the-shelf knowledge extraction models.
Jan, 2022 We have released a paper DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population
Dec, 2021 We have added dockerfile to create the enviroment automatically.
Nov, 2021 The demo of DeepKE, supporting real-time extration without deploying and training, has been released.
The documentation of DeepKE, containing the details of DeepKE such as source codes and datasets, has been released.
Oct, 2021 pip install deepke
The codes of deepke-v2.0 have been released.
Aug, 2019 The codes of deepke-v1.0 have been released.
Aug, 2018 The project DeepKE startup and codes of deepke-v0.1 have been released.

Prediction Demo

There is a demonstration of prediction. The GIF file is created by Terminalizer. Get the code.

Model Framework

DeepKE contains a unified framework for named entity recognition, relation extraction and attribute extraction, the three knowledge extraction functions.
Each task can be implemented in different scenarios. For example, we can achieve relation extraction in standard, low-resource (few-shot), document-level and multimodal settings.
Each application scenario comprises of three components: Data including Tokenizer, Preprocessor and Loader, Model including Module, Encoder and Forwarder, Core including Training, Evaluation and Prediction.

Quick Start

DeepKE-LLM

In the era of large models, DeepKE-LLM utilizes a completely new environment dependency.

conda create -n deepke-llm python=3.9
conda activate deepke-llm

cd example/llm
pip install -r requirements.txt

Please note that the requirements.txt file is located in the example/llm folder.

DeepKE-MCP-Tools

We integrate the MCP (Model Calling Protocol) service tools into DeepKE, enabling knowledge extraction through large language models (LLMs) as tool callers for lightweight models.

The MCP service has been deployed and is accessible at URL.
For local deployment, refer to the README for detailed operational procedures.

DeepKE

DeepKE supports pip install deepke.
Take the fully supervised relation extraction for example.
DeepKE supports both manual and docker image environment configuration, you can choose the appropriate way to build.
Highly recommended to install deepke in a Linux environment.

ð§Manual Environment Configuration

Step1 Download the basic code

git clone --depth 1 https://github.com/zjunlp/DeepKE.git

Step2 Create a virtual environment using Anaconda and enter it.

conda create -n deepke python=3.8

conda activate deepke

Install DeepKE with source code

pip install -r requirements.txt

python setup.py install

python setup.py develop

Install DeepKE with pip (NOT recommended!)
```
pip install deepke
```
- Please make sure that pip version <= 24.0

Step3 Enter the task directory

cd DeepKE/example/re/standard

Step4 Download the dataset, or follow the annotation instructions to obtain data

wget 121.41.117.246:8080/Data/re/standard/data.tar.gz

tar -xzvf data.tar.gz

Many types of data formats are supported,and details are in each part.

Step5 Training (Parameters for training can be changed in the conf folder)

We support visual parameter tuning by using wandb.

python run.py

Step6 Prediction (Parameters for prediction can be changed in the conf folder)

Modify the path of the trained model in predict.yaml.The absolute path of the model needs to be usedï¼such as xxx/checkpoints/2019-12-03_ 17-35-30/cnn_ epoch21.pth.

python predict.py

âNOTE: if you encounter any errors, please refer to the Tips or submit a GitHub issue.

ð³Building With Docker Images

Step1 Install the Docker client

Install Docker and start the Docker service.

Step2 Pull the docker image and run the container

docker pull zjunlp/deepke:latest
docker run -it zjunlp/deepke:latest /bin/bash

The remaining steps are the same as Step 3 and onwards in Manual Environment Configuration.

âNOTE: You can refer to the Tips to speed up installation

Requirements

DeepKE

python == 3.8

torch>=1.5,<=1.11
hydra-core==1.0.6
tensorboard==2.4.1
matplotlib==3.4.1
transformers==4.26.0
jieba==0.42.1
scikit-learn==0.24.1
seqeval==1.2.2
opt-einsum==3.3.0
wandb==0.12.7
ujson==5.6.0
huggingface_hub==0.11.0
tensorboardX==2.5.1
nltk==3.8
protobuf==3.20.1
numpy==1.21.0
ipdb==0.13.11
pytorch-crf==0.7.2
tqdm==4.66.1
openai==0.28.0
Jinja2==3.1.2
datasets==2.13.2
pyhocon==0.3.60

Introduction of Three Functions

1. Named Entity Recognition

Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.

The data is stored in .txt files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):

Sentence	Person	Location	Organization
æ¬æ¥åäº¬9æ4æ¥è®¯è®°èæ¨æ¶æ¥éï¼é¨åçåºäººæ°æ¥æ¥å®£ä¼ åè¡å·¥ä½åº§è°ä¼9æ3æ¥å¨4æ¥å¨äº¬ä¸¾è¡ã	æ¨æ¶	åäº¬	äººæ°æ¥æ¥
ãçº¢æ¥¼æ¢¦ãç±çæ¶æå¯¼æ¼ï¼å¨æ±æãçèãå¨å²çå¤ä½ä¸å®¶åä¸å¶ä½ã	çæ¶æï¼å¨æ±æï¼çèï¼å¨å²
ç§¦å§çåµé©¬ä¿ä½äºéè¥¿çè¥¿å®å¸,æ¯ä¸çå«å¤§å¥è¿¹ä¹ä¸ã	ç§¦å§ç	éè¥¿çï¼è¥¿å®å¸

Read the detailed process in specific README
- STANDARD (Fully Supervised)
  
  We support LLM and provide the off-the-shelf model, DeepKE-cnSchema-NER, which will extract entities in cnSchema without training.
  
  Step1 Enter DeepKE/example/ner/standard. Download the dataset.
```
wget 121.41.117.246:8080/Data/ner/standard/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  
  The dataset and parameters can be customized in the data folder and conf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- FEW-SHOT
  
  Step1 Enter DeepKE/example/ner/few-shot. Download the dataset.
```
wget 121.41.117.246:8080/Data/ner/few_shot/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training in the low-resouce setting
  
  The directory where the model is loaded and saved and the configuration parameters can be cusomized in the conf folder.
```
python run.py +train=few_shot
```
  Users can modify load_path in conf/train/few_shot.yaml to use existing loaded model.
  
  Step3 Add - predict to conf/config.yaml, modify loda_path as the model path and write_path as the path where the predicted results are saved in conf/predict.yaml, and then run python predict.py
```
python predict.py
```
- MULTIMODAL
  
  Step1 Enter DeepKE/example/ner/multimodal. Download the dataset.
```
wget 121.41.117.246:8080/Data/ner/multimodal/data.tar.gz

tar -xzvf data.tar.gz
```
  We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
  
  Step2 Training in the multimodal setting
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify load_path in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

2. Relation Extraction

Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.

The data is stored in .csv files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):

Sentence	Relation	Head	Head_offset	Tail	Tail_offset
ãå²³ç¶ä¹æ¯ç¹ãæ¯çåæ§å¯¼ççµè§å§ï¼ç±é©¬æ©ç¶ãèæä¸»æ¼ã	å¯¼æ¼	å²³ç¶ä¹æ¯ç¹	1	çå	8
ãä¹çç ãæ¯å¨çºµæ¨ªä¸æç½è¿è½½çä¸é¨å°è¯´ï¼ä½èæ¯é¾é©¬ã	è¿è½½ç½ç«	ä¹çç	1	çºµæ¨ªä¸æç½	7
æèµ·æå·çç¾æ¯ï¼è¥¿æ¹æ»æ¯ç¬¬ä¸ä¸ªæ å¥èæµ·çè¯è¯ã	æå¨åå¸	è¥¿æ¹	8	æå·	2

!NOTE: If there are multiple entity types for one relation, entity types can be prefixed with the relation as inputs.
Read the detailed process in specific README
- STANDARD (Fully Supervised)
  
  We support LLM and provide the off-the-shelf model, DeepKE-cnSchema-RE, which will extract relations in cnSchema without training.
  
  Step1 Enter the DeepKE/example/re/standard folder. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/standard/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  
  The dataset and parameters can be customized in the data folder and conf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- FEW-SHOT
  
  Step1 Enter DeepKE/example/re/few-shot. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/few_shot/data.tar.gz

tar -xzvf data.tar.gz
```
  Step 2 Training
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify train_from_saved_model in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- DOCUMENT
  
  Step1 Enter DeepKE/example/re/document. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/document/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify train_from_saved_model in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- MULTIMODAL
  
  Step1 Enter DeepKE/example/re/multimodal. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/multimodal/data.tar.gz

tar -xzvf data.tar.gz
```
  We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
  
  Step2 Training
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify load_path in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

3. Attribute Extraction

Attribute extraction is to extract attributes for entities in a unstructed text.

The data is stored in .csv files. Some instances as following:

Sentence	Att	Ent	Ent_offset	Val	Val_offset
å¼ å¬æ¢ï¼å¥³ï¼æ±æï¼1968å¹´2æçï¼æ²³åæ·å¿äºº	æ°æ	å¼ å¬æ¢	0	æ±æ	6
è¯¸èäº®ï¼ååæï¼ä¸å½æ¶ææ°åºçåäºå®¶ãæå¦å®¶ãåæå®¶ã	æä»£	è¯¸èäº®	0	ä¸å½æ¶æ	8
2014å¹´10æ1æ¥è®¸éåæ§å¯¼ççµå½±ãé»éæ¶ä»£ãä¸æ	ä¸æ æ¶é´	é»éæ¶ä»£	19	2014å¹´10æ1æ¥	0

Read the detailed process in specific README
- STANDARD (Fully Supervised)
  
  Step1 Enter the DeepKE/example/ae/standard folder. Download the dataset.
```
wget 121.41.117.246:8080/Data/ae/standard/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  
  The dataset and parameters can be customized in the data folder and conf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

4. Event Extraction

Event extraction is the task to extract event type, event trigger words, event arguments from a unstructed text.
The data is stored in .tsv files, some instances are as follows:

Sentence	Event type	Trigger	Role	Argument
æ®ãæ¬§æ´²æ¶æ¥ãæ¥éï¼å½å°æ¶é´27æ¥ï¼æ³å½å·´é»å¢æµ®å®«åç©é¦åå·¥å ä¸æ»¡å·¥ä½æ¡ä»¶æ¶åèç½¢å·¥ï¼å¯¼è´è¯¥åç©é¦ä¹å æ¤éé¨è°¢å®¢ä¸å¤©ã	ç»ç»è¡ä¸º-ç½¢å·¥	ç½¢å·¥	ç½¢å·¥äººå	æ³å½å·´é»å¢æµ®å®«åç©é¦åå·¥
			æ¶é´	å½å°æ¶é´27æ¥
			æå±ç»ç»	æ³å½å·´é»å¢æµ®å®«åç©é¦
ä¸å½å¤è¿2019å¹´ä¸åå¹´å½æ¯åå©æ¶¦å¢é¿17%ï¼æ¶è´äºå°æ°è¡ä¸è¡æ	è´¢ç»/äº¤æ-åºå®/æ¶è´	æ¶è´	åºå®æ¹	å°æ°è¡ä¸
			æ¶è´æ¹	ä¸å½å¤è¿
			äº¤æç©	è¡æ
ç¾å½äºç¹å°å¤§èªå±13æ¥åçä¸èµ·è¡¨æ¼æºå æºäºæï¼é£è¡åå¼¹å°åºè±å¹¶å®å¨çéï¼äºææ²¡æé æäººåä¼¤äº¡ã	ç¾å®³/æå¤-å æº	å æº	æ¶é´	13æ¥
			å°ç¹	ç¾å½äºç¹å°

Read the detailed process in specific README
- STANDARD(Fully Supervised)
  
  Step1 Enter the DeepKE/example/ee/standard folder. Download the dataset.
```
wget 121.41.117.246:8080/Data/ee/DuEE.zip
unzip DuEE.zip
```
  Step 2 Training
  
  The dataset and parameters can be customized in the data folder and conf folder respectively.
```
python run.py
```
  Step 3 Prediction
```
python predict.py
```

Tips

1.Using nearest mirror, THU in China, will speed up the installation of Anaconda; aliyun in China, will speed up pip install XXX.

2.When encountering ModuleNotFoundError: No module named 'past'ï¼run pip install future .

3.It's slow to install the pretrained language models online. Recommend download pretrained models before use and save them in the pretrained folder. Read README.md in every task directory to check the specific requirement for saving pretrained models.

4.The old version of DeepKE is in the deepke-v1.0 branch. Users can change the branch to use the old version. The old version has been totally transfered to the standard relation extraction (example/re/standard).

5.If you want to modify the source code, it's recommended to install DeepKE with source codes. If not, the modification will not work. See issue

6.More related low-resource knowledge extraction works can be found in Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective.

7.Make sure the exact versions of requirements in requirements.txt.

To do

In next version, we plan to release a stronger LLM for KE.

Meanwhile, we will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.

Reading Materials

Data-Efficient Knowledge Graph Construction, é«æç¥è¯å¾è°±æå»º (Tutorial on CCKS 2022) [slides]

Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]

PromptKG Family: a Gallery of Prompt Learning & KG-related Research Works, Toolkits, and Paper-list [Resources]

Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective [Survey][Paper-list]

Related Toolkit

DoccanoãMarkToolãLabelStudio: Data Annotation Toolkits

LambdaKG: A library and benchmark for PLM-based KG embeddings

EasyInstruct: An easy-to-use framework to instruct Large Language Models

Reading Materials:

Data-Efficient Knowledge Graph Construction, é«æç¥è¯å¾è°±æå»º (Tutorial on CCKS 2022) [slides]

Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]

PromptKG Family: a Gallery of Prompt Learning & KG-related Research Works, Toolkits, and Paper-list [Resources]

Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective [Survey][Paper-list]

Related Toolkit:

DoccanoãMarkToolãLabelStudio: Data Annotation Toolkits

LambdaKG: A library and benchmark for PLM-based KG embeddings

EasyInstruct: An easy-to-use framework to instruct Large Language Models

Citation

Please cite our paper if you use DeepKE in your work

@inproceedings{EMNLP2022_Demo_DeepKE,
  author    = {Ningyu Zhang and
               Xin Xu and
               Liankuan Tao and
               Haiyang Yu and
               Hongbin Ye and
               Shuofei Qiao and
               Xin Xie and
               Xiang Chen and
               Zhoubo Li and
               Lei Li},
  editor    = {Wanxiang Che and
               Ekaterina Shutova},
  title     = {DeepKE: {A} Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population},
  booktitle = {{EMNLP} (Demos)},
  pages     = {98--108},
  publisher = {Association for Computational Linguistics},
  year      = {2022},
  url       = {https://aclanthology.org/2022.emnlp-demos.10}
}

Contributors

Ningyu Zhang, Haofen Wang, Fei Huang, Feiyu Xiong, Liankuan Tao, Xin Xu, Honghao Gui, Zhenru Zhang, Chuanqi Tan, Qiang Chen, Xiaohan Wang, Zekun Xi, Xinrong Li, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Peng Wang, Yuqi Zhu, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Jing Chen, Yuqi Zhu, Yujie Luo, Shumin Deng, Wen Zhang, Guozhou Zheng, Huajun Chen

Community Contributors: Shuo Shen, Zhoutian Shao, Wei Hu, thredreams, eltociear, Ziwen Xu, Rui Huang, Xiaolong Weng

Other Knowledge Extraction Open-Source Projects

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of OpenNRE

Cons of OpenNRE

Code Comparison

Pros of PaddleNLP

Cons of PaddleNLP

Code Comparison

Pros of DGL

Cons of DGL

Code Comparison

Pros of PyTorch-BigGraph

Cons of PyTorch-BigGraph

Code Comparison

Pros of OpenNMT-py

Cons of OpenNMT-py

Code Comparison

Pros of AllenNLP

Cons of AllenNLP

Code Comparison

Convert designs to code with AI

README

A Deep Learning Based Knowledge Extraction Toolkitfor Knowledge Graph Construction

Table of Contents

What's New

Prediction Demo

Model Framework

Quick Start

DeepKE-LLM

DeepKE-MCP-Tools

DeepKE

ð§Manual Environment Configuration

ð³Building With Docker Images

Requirements

DeepKE

Introduction of Three Functions

1. Named Entity Recognition

2. Relation Extraction

3. Attribute Extraction

4. Event Extraction

Tips

To do

Reading Materials

Related Toolkit

Citation

Contributors

Other Knowledge Extraction Open-Source Projects

Top Related Projects

Convert designs to code with AI

A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction

ð§Manual Environment Configuration

ð³Building With Docker Images