Top Related Projects
An Open-Source Package for Neural Relation Extraction (NRE)
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Generate embeddings from large-scale graph-structured data.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
An open-source NLP research library, built on PyTorch.
Quick Overview
DeepKE is an open-source knowledge extraction toolkit supporting low-resource, document-level, and multimodal scenarios. It provides a unified framework for various knowledge extraction tasks, including named entity recognition, relation extraction, and attribute extraction. DeepKE aims to make knowledge extraction more accessible and efficient for researchers and practitioners.
Pros
- Supports multiple knowledge extraction tasks in a unified framework
- Offers pre-trained models and easy-to-use interfaces for quick deployment
- Provides support for low-resource scenarios, making it useful for languages or domains with limited data
- Includes multimodal capabilities, allowing for knowledge extraction from text and images
Cons
- May have a steeper learning curve for users unfamiliar with knowledge extraction concepts
- Documentation could be more comprehensive, especially for advanced use cases
- Limited support for languages other than English and Chinese
- Performance may vary depending on the specific task and dataset
Code Examples
- Named Entity Recognition (NER):
from deepke.name_entity_re import *
# Load pre-trained NER model
model = NERModel("bert", "bert-base-chinese", labels=["PER", "ORG", "LOC"])
# Perform NER on a given text
text = "张三在北京大学工作"
result = model.predict(text)
print(result)
- Relation Extraction:
from deepke.relation_extraction import *
# Load pre-trained relation extraction model
model = REModel("bert", "bert-base-chinese")
# Extract relations from a sentence
sentence = "苹果公司的总部位于加利福尼亚州"
subject = "苹果公司"
object = "加利福尼亚州"
result = model.predict(sentence, subject, object)
print(result)
- Attribute Extraction:
from deepke.attribute_extraction import *
# Load pre-trained attribute extraction model
model = AEModel("bert", "bert-base-chinese")
# Extract attributes from a given text
text = "这款手机的屏幕尺寸为6.1英寸,电池容量为3000mAh"
result = model.predict(text)
print(result)
Getting Started
To get started with DeepKE, follow these steps:
- Install DeepKE:
pip install deepke
- Import the desired module:
from deepke.name_entity_re import NERModel
from deepke.relation_extraction import REModel
from deepke.attribute_extraction import AEModel
- Load a pre-trained model and use it for prediction:
model = NERModel("bert", "bert-base-chinese", labels=["PER", "ORG", "LOC"])
result = model.predict("张三在北京大学工作")
print(result)
For more detailed instructions and advanced usage, refer to the official DeepKE documentation.
Competitor Comparisons
An Open-Source Package for Neural Relation Extraction (NRE)
Pros of OpenNRE
- Focuses specifically on neural relation extraction, providing a more specialized toolkit
- Offers pre-trained models for quick deployment and testing
- Includes a comprehensive evaluation module for model performance analysis
Cons of OpenNRE
- Limited to relation extraction tasks, while DeepKE covers a broader range of knowledge extraction tasks
- Less flexibility in terms of customization and integration with other NLP tasks
- Smaller community and fewer updates compared to DeepKE
Code Comparison
OpenNRE:
from opennre import encoder, model, framework
# Define the model
model = model.SoftmaxNN(
ckpt='bert-base-uncased',
encoder=encoder.BERTEncoder(max_length=80)
)
# Load data and train
framework.train_model(model, train_path='train.txt', val_path='val.txt')
DeepKE:
from deepke import extraction
# Initialize and train the model
extraction.set_config(task='relation_extraction', model='bert')
extraction.train()
# Predict using the trained model
extraction.predict(text="Apple Inc. was founded by Steve Jobs.")
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
Pros of PaddleNLP
- Comprehensive NLP toolkit with a wide range of pre-trained models and datasets
- Seamless integration with PaddlePaddle deep learning framework
- Extensive documentation and tutorials for ease of use
Cons of PaddleNLP
- Primarily focused on Chinese NLP tasks, which may limit its applicability for other languages
- Steeper learning curve for users not familiar with PaddlePaddle ecosystem
Code Comparison
PaddleNLP:
from paddlenlp import Taskflow
ner = Taskflow("ner")
result = ner("华为是一家总部位于广东省深圳市的中国大型通信设备公司")
print(result)
DeepKE:
from deepke.name_entity_re import NER
ner = NER("bert")
result = ner.predict("华为是一家总部位于广东省深圳市的中国大型通信设备公司")
print(result)
Both repositories provide tools for natural language processing tasks, with a focus on named entity recognition in this example. PaddleNLP offers a more comprehensive toolkit within the PaddlePaddle ecosystem, while DeepKE provides a specialized framework for knowledge extraction tasks. The choice between them depends on the specific requirements of the project and familiarity with the respective ecosystems.
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Pros of DGL
- Broader scope: Focuses on general graph neural networks, applicable to various domains
- More mature project with larger community and extensive documentation
- Supports multiple deep learning frameworks (PyTorch, MXNet, TensorFlow)
Cons of DGL
- Steeper learning curve due to its more general-purpose nature
- May require more code for specific knowledge extraction tasks
- Less specialized for knowledge graph and relation extraction tasks
Code Comparison
DeepKE example (relation extraction):
from deepke.name_entity_re.standard import *
model = NERModel('bert', 'bert-base-uncased', num_labels=9)
model.train_model(train_data)
predictions = model.predict(test_data)
DGL example (graph neural network):
import dgl
import torch.nn as nn
class GCN(nn.Module):
def __init__(self, in_feats, h_feats, num_classes):
super(GCN, self).__init__()
self.conv1 = dgl.nn.GraphConv(in_feats, h_feats)
self.conv2 = dgl.nn.GraphConv(h_feats, num_classes)
def forward(self, g, in_feat):
h = self.conv1(g, in_feat)
h = self.conv2(g, h)
return h
Generate embeddings from large-scale graph-structured data.
Pros of PyTorch-BigGraph
- Designed for large-scale graph embedding, capable of handling billions of nodes and edges
- Supports distributed training across multiple machines for improved performance
- Offers a variety of loss functions and edge sampling techniques
Cons of PyTorch-BigGraph
- Focused primarily on graph embeddings, less versatile for other NLP tasks
- Steeper learning curve due to its specialized nature and distributed computing features
- Less active development and community support compared to DeepKE
Code Comparison
PyTorch-BigGraph:
config = torchbiggraph.config.parse_config({
'entities': {'all': {'num_partitions': 1}},
'relations': [{'name': 'all', 'lhs': 'all', 'rhs': 'all'}],
'dimension': 100,
'max_epochs': 50,
'num_batch_negs': 1000,
'num_uniform_negs': 1000,
})
DeepKE:
config = {
"model_name": "bert-base-uncased",
"max_seq_len": 128,
"batch_size": 32,
"learning_rate": 2e-5,
"num_train_epochs": 3,
}
model = NERModel("bert", "bert-base-uncased", args=config)
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Pros of OpenNMT-py
- More mature and widely adopted project with extensive documentation
- Supports a broader range of neural machine translation architectures
- Active community and regular updates
Cons of OpenNMT-py
- Focused primarily on machine translation, less versatile for other NLP tasks
- Steeper learning curve for beginners in NLP
Code Comparison
OpenNMT-py:
import onmt
# Define model parameters
model_opts = {"model_type": "transformer", "src_vocab": src_vocab, "tgt_vocab": tgt_vocab}
# Create and train the model
model = onmt.models.build_model(model_opts)
trainer = onmt.Trainer(model, train_data, valid_data, optim)
trainer.train()
DeepKE:
from deepke import NERModel
# Define model parameters
model_params = {"model_type": "bert", "num_labels": num_labels}
# Create and train the model
model = NERModel("bert", "bert-base-cased", args=model_params)
model.train_model(train_data)
OpenNMT-py is more specialized for machine translation tasks, while DeepKE offers a broader range of NLP functionalities, including named entity recognition, relation extraction, and attribute extraction. DeepKE provides a simpler API for various NLP tasks, making it more accessible for users new to NLP. However, OpenNMT-py's focus on translation allows for more advanced and customizable translation models.
An open-source NLP research library, built on PyTorch.
Pros of AllenNLP
- More comprehensive and general-purpose NLP toolkit
- Larger community and more extensive documentation
- Built on PyTorch, offering greater flexibility and ease of use
Cons of AllenNLP
- Steeper learning curve for beginners
- Less focused on specific knowledge extraction tasks
- May require more setup and configuration for specialized use cases
Code Comparison
DeepKE example (entity extraction):
from deepke.name_entity_re.standard import *
model = NERModel('bert', 'bert-base-chinese', num_labels=len(label2id))
model.train_model(train_data)
predictions = model.predict(["我在北京大学学习"])
AllenNLP example (named entity recognition):
from allennlp.predictors import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/ner-model-2020.02.10.tar.gz")
result = predictor.predict(sentence="The girl went to Harvard University.")
Both libraries offer streamlined APIs for NLP tasks, but DeepKE focuses more on knowledge extraction, while AllenNLP provides a broader range of NLP functionalities. DeepKE's API is more tailored for specific tasks, whereas AllenNLP's approach is more generalized and requires additional configuration for specialized use cases.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
English | ç®ä½ä¸æ
A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction
DeepKE is a knowledge extraction toolkit for knowledge graph construction supporting cnSchemaï¼low-resource, document-level and multimodal scenarios for entity, relation and attribute extraction. We provide documents, online demo, paper, slides and poster for beginners.
- âWant to use Large Language Models with DeepKE? Try DeepKE-LLM and OneKE, have fun!
- âWant to train supervised models? Try Quick Start, we provide the NER models (e.g, LightNER(COLING'22), W2NER(AAAI'22)), relation extraction models (e.g., KnowPrompt(WWW'22)), relational triple extraction models (e.g., ASP(EMNLP'22), PRGC(ACL'21), PURE(NAACL'21)), and release off-the-shelf models at DeepKE-cnSchema, have fun!
- We recommend using Linux; if using Windows, please use
\\
in file paths; - If HuggingFace is inaccessible, please consider using
wisemodel
ormodescape
.
If you encounter any issues during the installation of DeepKE and DeepKE-LLM, please check Tips or promptly submit an issue, and we will assist you with resolving the problem!
Table of Contents
- Table of Contents
- What's New
- Prediction Demo
- Model Framework
- Quick Start
- Tips
- To do
- Reading Materials
- Related Toolkit
- Citation
- Contributors
- Other Knowledge Extraction Open-Source Projects
What's New
April, 2024
We release a new bilingual (Chinese and English) schema-based information extraction model called OneKE based on Chinese-Alpaca-2-13B.Feb, 2024
We release a large-scale (0.32B tokens) high-quality bilingual (Chinese and English) Information Extraction (IE) instruction dataset named IEPile, along with two models trained withIEPile
, baichuan2-13b-iepile-lora and llama2-13b-iepile-lora.Sep 2023
a bilingual Chinese English Information Extraction (IE) instruction dataset calledInstructIE
was released for the Instruction based Knowledge Graph Construction Task (Instruction based KGC), as detailed in here.June, 2023
We update DeepKE-LLM to support knowledge extraction with KnowLM, ChatGLM, LLaMA-series, GPT-series etc.Apr, 2023
We have added new models, including CP-NER(IJCAI'23), ASP(EMNLP'22), PRGC(ACL'21), PURE(NAACL'21), provided event extraction capabilities (Chinese and English), and offered compatibility with higher versions of Python packages (e.g., Transformers).Feb, 2023
We have supported using LLM (GPT-3) with in-context learning (based on EasyInstruct) & data generation, added a NER model W2NER(AAAI'22).
Previous News
-
Nov, 2022
Add data annotation instructions for entity recognition and relation extraction, automatic labelling of weakly supervised data (entity extraction and relation extraction), and optimize multi-GPU training. -
Sept, 2022
The paper DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population has been accepted by the EMNLP 2022 System Demonstration Track. -
Aug, 2022
We have added data augmentation (Chinese, English) support for low-resource relation extraction. -
June, 2022
We have added multimodal support for entity and relation extraction. -
May, 2022
We have released DeepKE-cnschema with off-the-shelf knowledge extraction models. -
Jan, 2022
We have released a paper DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population -
Dec, 2021
We have addeddockerfile
to create the enviroment automatically. -
Nov, 2021
The demo of DeepKE, supporting real-time extration without deploying and training, has been released. -
The documentation of DeepKE, containing the details of DeepKE such as source codes and datasets, has been released.
-
Oct, 2021
pip install deepke
-
The codes of deepke-v2.0 have been released.
-
Aug, 2019
The codes of deepke-v1.0 have been released. -
Aug, 2018
The project DeepKE startup and codes of deepke-v0.1 have been released.
Prediction Demo
There is a demonstration of prediction. The GIF file is created by Terminalizer. Get the code.
Model Framework
- DeepKE contains a unified framework for named entity recognition, relation extraction and attribute extraction, the three knowledge extraction functions.
- Each task can be implemented in different scenarios. For example, we can achieve relation extraction in standard, low-resource (few-shot), document-level and multimodal settings.
- Each application scenario comprises of three components: Data including Tokenizer, Preprocessor and Loader, Model including Module, Encoder and Forwarder, Core including Training, Evaluation and Prediction.
Quick Start
DeepKE-LLM
In the era of large models, DeepKE-LLM utilizes a completely new environment dependency.
conda create -n deepke-llm python=3.9
conda activate deepke-llm
cd example/llm
pip install -r requirements.txt
Please note that the requirements.txt
file is located in the example/llm
folder.
DeepKE
- DeepKE supports
pip install deepke
.
Take the fully supervised relation extraction for example. - DeepKE supports both manual and docker image environment configuration, you can choose the appropriate way to build.
- Highly recommended to install deepke in a Linux environment.
ð§Manual Environment Configuration
Step1 Download the basic code
git clone --depth 1 https://github.com/zjunlp/DeepKE.git
Step2 Create a virtual environment using Anaconda
and enter it.
conda create -n deepke python=3.8
conda activate deepke
-
Install DeepKE with source code
pip install -r requirements.txt python setup.py install python setup.py develop
-
Install DeepKE with
pip
(NOT recommended!)pip install deepke
Step3 Enter the task directory
cd DeepKE/example/re/standard
Step4 Download the dataset, or follow the annotation instructions to obtain data
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data.tar.gz
Many types of data formats are supported,and details are in each part.
Step5 Training (Parameters for training can be changed in the conf
folder)
We support visual parameter tuning by using wandb.
python run.py
Step6 Prediction (Parameters for prediction can be changed in the conf
folder)
Modify the path of the trained model in predict.yaml
.The absolute path of the model needs to be usedï¼such as xxx/checkpoints/2019-12-03_ 17-35-30/cnn_ epoch21.pth
.
python predict.py
- âNOTE: if you encounter any errors, please refer to the Tips or submit a GitHub issue.
ð³Building With Docker Images
Step1 Install the Docker client
Install Docker and start the Docker service.
Step2 Pull the docker image and run the container
docker pull zjunlp/deepke:latest
docker run -it zjunlp/deepke:latest /bin/bash
The remaining steps are the same as Step 3 and onwards in Manual Environment Configuration.
- âNOTE: You can refer to the Tips to speed up installation
Requirements
DeepKE
python == 3.8
- torch>=1.5,<=1.11
- hydra-core==1.0.6
- tensorboard==2.4.1
- matplotlib==3.4.1
- transformers==4.26.0
- jieba==0.42.1
- scikit-learn==0.24.1
- seqeval==1.2.2
- opt-einsum==3.3.0
- wandb==0.12.7
- ujson==5.6.0
- huggingface_hub==0.11.0
- tensorboardX==2.5.1
- nltk==3.8
- protobuf==3.20.1
- numpy==1.21.0
- ipdb==0.13.11
- pytorch-crf==0.7.2
- tqdm==4.66.1
- openai==0.28.0
- Jinja2==3.1.2
- datasets==2.13.2
- pyhocon==0.3.60
Introduction of Three Functions
1. Named Entity Recognition
-
Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.
-
The data is stored in
.txt
files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):Sentence Person Location Organization æ¬æ¥å京9æ4æ¥è®¯è®°è æ¨æ¶æ¥éï¼é¨åçåºäººæ°æ¥æ¥å®£ä¼ åè¡å·¥ä½åº§è°ä¼9æ3æ¥å¨4æ¥å¨äº¬ä¸¾è¡ã æ¨æ¶ å京 人æ°æ¥æ¥ ã红楼梦ãç±çæ¶æ导æ¼ï¼å¨æ±æãçèãå¨å²çå¤ä½ä¸å®¶åä¸å¶ä½ã çæ¶æï¼å¨æ±æï¼çèï¼å¨å² 秦å§çå µé©¬ä¿ä½äºé西ç西å®å¸,æ¯ä¸çå «å¤§å¥è¿¹ä¹ä¸ã 秦å§ç é西çï¼è¥¿å®å¸ -
Read the detailed process in specific README
-
We support LLM and provide the off-the-shelf model, DeepKE-cnSchema-NER, which will extract entities in cnSchema without training.
Step1 Enter
DeepKE/example/ner/standard
. Download the dataset.wget 120.27.214.45/Data/ner/standard/data.tar.gz tar -xzvf data.tar.gz
Step2 Training
The dataset and parameters can be customized in the
data
folder andconf
folder respectively.python run.py
Step3 Prediction
python predict.py
-
Step1 Enter
DeepKE/example/ner/few-shot
. Download the dataset.wget 120.27.214.45/Data/ner/few_shot/data.tar.gz tar -xzvf data.tar.gz
Step2 Training in the low-resouce setting
The directory where the model is loaded and saved and the configuration parameters can be cusomized in the
conf
folder.python run.py +train=few_shot
Users can modify
load_path
inconf/train/few_shot.yaml
to use existing loaded model.Step3 Add
- predict
toconf/config.yaml
, modifyloda_path
as the model path andwrite_path
as the path where the predicted results are saved inconf/predict.yaml
, and then runpython predict.py
python predict.py
-
Step1 Enter
DeepKE/example/ner/multimodal
. Download the dataset.wget 120.27.214.45/Data/ner/multimodal/data.tar.gz tar -xzvf data.tar.gz
We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
Step2 Training in the multimodal setting
- The dataset and parameters can be customized in the
data
folder andconf
folder respectively. - Start with the model trained last time: modify
load_path
inconf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir
.
python run.py
Step3 Prediction
python predict.py
- The dataset and parameters can be customized in the
-
2. Relation Extraction
-
Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.
-
The data is stored in
.csv
files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):Sentence Relation Head Head_offset Tail Tail_offset ãå²³ç¶ä¹æ¯ç¹ãæ¯çåæ§å¯¼ççµè§å§ï¼ç±é©¬æ©ç¶ãèæ主æ¼ã å¯¼æ¼ å²³ç¶ä¹æ¯ç¹ 1 çå 8 ãä¹çç ãæ¯å¨çºµæ¨ªä¸æç½è¿è½½çä¸é¨å°è¯´ï¼ä½è æ¯é¾é©¬ã è¿è½½ç½ç« ä¹çç 1 纵横ä¸æç½ 7 æèµ·æå·çç¾æ¯ï¼è¥¿æ¹æ»æ¯ç¬¬ä¸ä¸ªæ å ¥èæµ·çè¯è¯ã æå¨åå¸ è¥¿æ¹ 8 æå· 2 -
!NOTE: If there are multiple entity types for one relation, entity types can be prefixed with the relation as inputs.
-
Read the detailed process in specific README
-
We support LLM and provide the off-the-shelf model, DeepKE-cnSchema-RE, which will extract relations in cnSchema without training.
Step1 Enter the
DeepKE/example/re/standard
folder. Download the dataset.wget 120.27.214.45/Data/re/standard/data.tar.gz tar -xzvf data.tar.gz
Step2 Training
The dataset and parameters can be customized in the
data
folder andconf
folder respectively.python run.py
Step3 Prediction
python predict.py
-
Step1 Enter
DeepKE/example/re/few-shot
. Download the dataset.wget 120.27.214.45/Data/re/few_shot/data.tar.gz tar -xzvf data.tar.gz
Step 2 Training
- The dataset and parameters can be customized in the
data
folder andconf
folder respectively. - Start with the model trained last time: modify
train_from_saved_model
inconf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir
.
python run.py
Step3 Prediction
python predict.py
- The dataset and parameters can be customized in the
-
Step1 Enter
DeepKE/example/re/document
. Download the dataset.wget 120.27.214.45/Data/re/document/data.tar.gz tar -xzvf data.tar.gz
Step2 Training
- The dataset and parameters can be customized in the
data
folder andconf
folder respectively. - Start with the model trained last time: modify
train_from_saved_model
inconf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir
.
python run.py
Step3 Prediction
python predict.py
- The dataset and parameters can be customized in the
-
Step1 Enter
DeepKE/example/re/multimodal
. Download the dataset.wget 120.27.214.45/Data/re/multimodal/data.tar.gz tar -xzvf data.tar.gz
We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
Step2 Training
- The dataset and parameters can be customized in the
data
folder andconf
folder respectively. - Start with the model trained last time: modify
load_path
inconf/train.yaml
as the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir
.
python run.py
Step3 Prediction
python predict.py
- The dataset and parameters can be customized in the
-
3. Attribute Extraction
-
Attribute extraction is to extract attributes for entities in a unstructed text.
-
The data is stored in
.csv
files. Some instances as following:Sentence Att Ent Ent_offset Val Val_offset å¼ å¬æ¢ ï¼å¥³ï¼æ±æï¼1968å¹´2æçï¼æ²³åæ·å¿äºº æ°æ å¼ å¬æ¢ 0 æ±æ 6 诸è亮ï¼ååæï¼ä¸å½æ¶ææ°åºçåäºå®¶ãæå¦å®¶ãåæ家ã æ代 诸è亮 0 ä¸å½æ¶æ 8 2014å¹´10æ1æ¥è®¸éåæ§å¯¼ççµå½±ãé»éæ¶ä»£ãä¸æ ä¸æ æ¶é´ é»éæ¶ä»£ 19 2014å¹´10æ1æ¥ 0 -
Read the detailed process in specific README
-
Step1 Enter the
DeepKE/example/ae/standard
folder. Download the dataset.wget 120.27.214.45/Data/ae/standard/data.tar.gz tar -xzvf data.tar.gz
Step2 Training
The dataset and parameters can be customized in the
data
folder andconf
folder respectively.python run.py
Step3 Prediction
python predict.py
-
4. Event Extraction
- Event extraction is the task to extract event type, event trigger words, event arguments from a unstructed text.
- The data is stored in
.tsv
files, some instances are as follows:
Sentence | Event type | Trigger | Role | Argument | |
---|---|---|---|---|---|
æ®ã欧洲æ¶æ¥ãæ¥éï¼å½å°æ¶é´27æ¥ï¼æ³å½å·´é»å¢æµ®å®«åç©é¦åå·¥å ä¸æ»¡å·¥ä½æ¡ä»¶æ¶åè罢工ï¼å¯¼è´è¯¥åç©é¦ä¹å æ¤éé¨è°¢å®¢ä¸å¤©ã | ç»ç»è¡ä¸º-罢工 | 罢工 | 罢工人å | æ³å½å·´é»å¢æµ®å®«åç©é¦åå·¥ | |
æ¶é´ | å½å°æ¶é´27æ¥ | ||||
æå±ç»ç» | æ³å½å·´é»å¢æµ®å®«åç©é¦ | ||||
ä¸å½å¤è¿2019å¹´ä¸åå¹´å½æ¯åå©æ¶¦å¢é¿17%ï¼æ¶è´äºå°æ°è¡ä¸è¡æ | è´¢ç»/交æ-åºå®/æ¶è´ | æ¶è´ | åºå®æ¹ | å°æ°è¡ä¸ | |
æ¶è´æ¹ | ä¸å½å¤è¿ | ||||
交æç© | è¡æ | ||||
ç¾å½äºç¹å °å¤§èªå±13æ¥åçä¸èµ·è¡¨æ¼æºå æºäºæ ï¼é£è¡åå¼¹å°åºè±å¹¶å®å ¨çéï¼äºæ 没æé æ人å伤亡ã | ç¾å®³/æå¤-å æº | å æº | æ¶é´ | 13æ¥ | |
å°ç¹ | ç¾å½äºç¹å ° |
-
Read the detailed process in specific README
-
Step1 Enter the
DeepKE/example/ee/standard
folder. Download the dataset.wget 120.27.214.45/Data/ee/DuEE.zip unzip DuEE.zip
Step 2 Training
The dataset and parameters can be customized in the
data
folder andconf
folder respectively.python run.py
Step 3 Prediction
python predict.py
-
Tips
1.Using nearest mirror
, THU in China, will speed up the installation of Anaconda; aliyun in China, will speed up pip install XXX
.
2.When encountering ModuleNotFoundError: No module named 'past'
ï¼run pip install future
.
3.It's slow to install the pretrained language models online. Recommend download pretrained models before use and save them in the pretrained
folder. Read README.md
in every task directory to check the specific requirement for saving pretrained models.
4.The old version of DeepKE is in the deepke-v1.0 branch. Users can change the branch to use the old version. The old version has been totally transfered to the standard relation extraction (example/re/standard).
5.If you want to modify the source code, it's recommended to install DeepKE with source codes. If not, the modification will not work. See issue
6.More related low-resource knowledge extraction works can be found in Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective.
7.Make sure the exact versions of requirements in requirements.txt
.
To do
In next version, we plan to release a stronger LLM for KE.
Meanwhile, we will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.
Reading Materials
Data-Efficient Knowledge Graph Construction, é«æç¥è¯å¾è°±æ建 (Tutorial on CCKS 2022) [slides]
Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]
PromptKG Family: a Gallery of Prompt Learning & KG-related Research Works, Toolkits, and Paper-list [Resources]
Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective [Survey][Paper-list]
Related Toolkit
DoccanoãMarkToolãLabelStudio: Data Annotation Toolkits
LambdaKG: A library and benchmark for PLM-based KG embeddings
EasyInstruct: An easy-to-use framework to instruct Large Language Models
Reading Materials:
Data-Efficient Knowledge Graph Construction, é«æç¥è¯å¾è°±æ建 (Tutorial on CCKS 2022) [slides]
Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]
PromptKG Family: a Gallery of Prompt Learning & KG-related Research Works, Toolkits, and Paper-list [Resources]
Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective [Survey][Paper-list]
Related Toolkit:
DoccanoãMarkToolãLabelStudio: Data Annotation Toolkits
LambdaKG: A library and benchmark for PLM-based KG embeddings
EasyInstruct: An easy-to-use framework to instruct Large Language Models
Citation
Please cite our paper if you use DeepKE in your work
@inproceedings{EMNLP2022_Demo_DeepKE,
author = {Ningyu Zhang and
Xin Xu and
Liankuan Tao and
Haiyang Yu and
Hongbin Ye and
Shuofei Qiao and
Xin Xie and
Xiang Chen and
Zhoubo Li and
Lei Li},
editor = {Wanxiang Che and
Ekaterina Shutova},
title = {DeepKE: {A} Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population},
booktitle = {{EMNLP} (Demos)},
pages = {98--108},
publisher = {Association for Computational Linguistics},
year = {2022},
url = {https://aclanthology.org/2022.emnlp-demos.10}
}
Contributors
Ningyu Zhang, Haofen Wang, Fei Huang, Feiyu Xiong, Liankuan Tao, Xin Xu, Honghao Gui, Zhenru Zhang, Chuanqi Tan, Qiang Chen, Xiaohan Wang, Zekun Xi, Xinrong Li, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Peng Wang, Yuqi Zhu, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Jing Chen, Yuqi Zhu, Shumin Deng, Wen Zhang, Guozhou Zheng, Huajun Chen
Community Contributors: thredreams, eltociear, Ziwen Xu, Rui Huang, Xiaolong Weng
Other Knowledge Extraction Open-Source Projects
Top Related Projects
An Open-Source Package for Neural Relation Extraction (NRE)
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Generate embeddings from large-scale graph-structured data.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
An open-source NLP research library, built on PyTorch.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot