Top Related Projects
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
Quick Overview
OpenNRE is an open-source and extensible toolkit for neural relation extraction. It provides a unified framework with various neural models and benchmarks for relation extraction tasks. The toolkit aims to facilitate research and development in the field of relation extraction by offering easy-to-use interfaces and comprehensive documentation.
Pros
- Comprehensive collection of relation extraction models and datasets
- Easy-to-use interfaces for training, evaluation, and prediction
- Extensible architecture allowing for easy integration of new models
- Well-documented with detailed tutorials and examples
Cons
- Primarily focused on English language datasets
- May require significant computational resources for training large models
- Limited support for multi-lingual relation extraction
- Some advanced features may have a steeper learning curve for beginners
Code Examples
- Loading a pre-trained model and making predictions:
import opennre
model = opennre.get_model('wiki80_bert_softmax')
result = model.infer({'text': 'He was the son of Máel Dúin mac Máele Fithrich, and grandson of the high king Áed Uaridnach (died 612).', 'h': {'pos': (18, 46)}, 't': {'pos': (78, 91)}})
print(result)
- Training a custom model:
from opennre import encoder, model, framework
# Define your model architecture
sentence_encoder = encoder.BERTEncoder(
max_length=80,
pretrain_path='bert-base-uncased'
)
model = model.SoftmaxNN(sentence_encoder, num_class=80)
# Set up the framework and train
framework = framework.SentenceRE(
train_path='./benchmark/wiki80/wiki80_train.txt',
val_path='./benchmark/wiki80/wiki80_val.txt',
test_path='./benchmark/wiki80/wiki80_val.txt',
model=model,
ckpt='ckpt/wiki80_bert_softmax',
batch_size=64,
max_epoch=10,
lr=2e-5,
opt='adamw'
)
framework.train_model()
- Evaluating a model on a custom dataset:
from opennre import encoder, model, framework
# Load your trained model
sentence_encoder = encoder.BERTEncoder(
max_length=80,
pretrain_path='bert-base-uncased'
)
model = model.SoftmaxNN(sentence_encoder, num_class=80)
framework = framework.SentenceRE(
model=model,
ckpt='ckpt/wiki80_bert_softmax'
)
# Evaluate on custom dataset
framework.load_state_dict(torch.load('ckpt/wiki80_bert_softmax.pth.tar')['state_dict'])
result = framework.eval_model(framework.test_loader)
print(result)
Getting Started
To get started with OpenNRE, follow these steps:
- Install the library:
pip install opennre
- Download a pre-trained model:
import opennre
model = opennre.get_model('wiki80_bert_softmax')
- Use the model for inference:
result = model.infer({'text': 'Bill Gates is the founder of Microsoft.', 'h': {'pos': (0, 10)}, 't': {'pos': (28, 37)}})
print(result)
For more detailed instructions and advanced usage, refer to the official documentation and examples in the GitHub repository.
Competitor Comparisons
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
Pros of DeepKE
- Supports a wider range of knowledge extraction tasks, including named entity recognition and attribute extraction, in addition to relation extraction
- Offers more diverse model architectures and pre-training options
- Provides better documentation and tutorials for ease of use
Cons of DeepKE
- May have a steeper learning curve due to its broader scope and more complex architecture
- Potentially requires more computational resources for training and inference
- Less focused on relation extraction specifically compared to OpenNRE
Code Comparison
OpenNRE example:
model = model.SentenceRE(
model = 'bert',
pretrain_path = 'bert-base-uncased',
num_class = len(rel2id)
)
DeepKE example:
config = AutoConfig.from_pretrained(args.bert_path)
model = AutoModel.from_pretrained(args.bert_path, config=config)
re_model = REModel(config, model, num_labels=len(label2id))
Both frameworks use pre-trained models, but DeepKE offers more flexibility in model selection and configuration. OpenNRE provides a more streamlined approach for relation extraction tasks, while DeepKE allows for greater customization across various knowledge extraction tasks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
OpenNRE (sub-project of OpenSKL)
OpenNRE is a sub-project of OpenSKL, providing an Open-source Neural Relation Extraction toolkit for extracting structured knowledge from plain text, with ATT as key features to consider relation-associated text information.
Overview
OpenNRE is an open-source and extensible toolkit that provides a unified framework to implement relation extraction models. We unify the input and output interfaces of different relation extraction models and provide scalable options for each model. The toolkit covers both supervised and distant supervised settings, and is compatible with both conventional neural networks and pre-trained language models.
Relation extraction is a natural language processing (NLP) task aiming at extracting relations (e.g., founder of) between entities (e.g., Bill Gates and Microsoft). For example, from the sentence Bill Gates founded Microsoft, we can extract the relation triple (Bill Gates, founder of, Microsoft).
Relation extraction is a crucial technique in automatic knowledge graph construction. By using relation extraction, we can accumulatively extract new relation facts and expand the knowledge graph, which, as a way for machines to understand the human world, has many downstream applications like question answering, recommender system and search engine. If you want to learn more about neural relation extraction, visit another project of ours (NREPapers).
It's our honor to help you better explore relation extraction with our OpenNRE toolkit! You can refer to our document for more details about this project.
Models
In this toolkit, we support CNN-based relation extraction models including standard CNN and our proposed CNN+ATT. We also implement methods based on pre-trained language models (BERT).
Evaluation
To validate the effectiveness of this toolkit, we employ the Bag-Level Relation Extraction task for evaluation.
Settings
We utilize the NYT10 dataset, which is a distantly supervised collection derived from the New York Times corpus and FreeBase. We mainly experiment on CNN-ATT model, which employs instance-level attention and shows superior performance compared with vanilla CNN.
Results
We report AUC and F1 scores of two models. The right two columns marked with (*) indicates the results sourced from Gao et al.(2021) and Lin et al.(2016). The results show that our implementation of CNN-ATT model is slighly better than the original paper, and also confirm the better performance of CNN-ATT over standard CNN model.
Model | AUC | F1 | AUC(Paper *) | F1(Paper *) |
---|---|---|---|---|
CNN | - | - | 0.212 | 0.318 |
CNN-ATT | 0.333 | 0.397 | 0.318 | 0.380 |
Usage
Installation
Install as A Python Package
We are now working on deploy OpenNRE as a Python package. Coming soon!
Using Git Repository
Clone the repository from our github page (don't forget to star us!)
git clone https://github.com/thunlp/OpenNRE.git
If it is too slow, you can try
git clone https://github.com/thunlp/OpenNRE.git --depth 1
Then install all the requirements:
pip install -r requirements.txt
Note: Please choose appropriate PyTorch version based on your machine (related to your CUDA version). For details, refer to https://pytorch.org/.
Then install the package with
python setup.py install
If you also want to modify the code, run this:
python setup.py develop
Note that we have excluded all data and pretrain files for fast deployment. You can manually download them by running scripts in the benchmark
and pretrain
folders. For example, if you want to download FewRel dataset, you can run
bash benchmark/download_fewrel.sh
Data
You can go into the benchmark
folder and download datasets using our scripts. We also list some of the information about the datasets in this document. We provide two distantly-supervised datasets with human-annotated test sets, NYT10m and Wiki20m. Check the datasets section for details.
Easy Start
Make sure you have installed OpenNRE as instructed above. Then import our package and load pre-trained models.
>>> import opennre
>>> model = opennre.get_model('wiki80_cnn_softmax')
Note that it may take a few minutes to download checkpoint and data for the first time. Then use infer
to do sentence-level relation extraction
>>> model.infer({'text': 'He was the son of Máel Dúin mac Máele Fithrich, and grandson of the high king Ãed Uaridnach (died 612).', 'h': {'pos': (18, 46)}, 't': {'pos': (78, 91)}})
('father', 0.5108704566955566)
You will get the relation result and its confidence score.
If you want to use the model on your GPU, just run
>>> model = model.cuda()
before calling the inference function.
For now, we have the following available models:
wiki80_cnn_softmax
: trained onwiki80
dataset with a CNN encoder.wiki80_bert_softmax
: trained onwiki80
dataset with a BERT encoder.wiki80_bertentity_softmax
: trained onwiki80
dataset with a BERT encoder (using entity representation concatenation).tacred_bert_softmax
: trained onTACRED
dataset with a BERT encoder.tacred_bertentity_softmax
: trained onTACRED
dataset with a BERT encoder (using entity representation concatenation).
Training
You can train your own models on your own data with OpenNRE. In example
folder we give example training codes for supervised RE models and bag-level RE models. You can either use our provided datasets or your own datasets. For example, you can use the following script to train a PCNN-ATT bag-level model on the NYT10 dataset with manual test set. The ATT algorithm is a typical method to combine a bag of sentences for extracting relations between entities.
python example/train_bag_cnn.py \
--metric auc \
--dataset nyt10m \
--batch_size 160 \
--lr 0.1 \
--weight_decay 1e-5 \
--max_epoch 100 \
--max_length 128 \
--seed 42 \
--encoder pcnn \
--aggr att
Or use the following script to train a BERT model on the Wiki80 dataset:
python example/train_supervised_bert.py \
--pretrain_path bert-base-uncased \
--dataset wiki80
We provide many options in the example training code and you can check them out for detailed instructions.
Citation
If you find OpenNRE is useful for your research, please consider citing the following papers:
@inproceedings{han-etal-2019-opennre,
title = "{O}pen{NRE}: An Open and Extensible Toolkit for Neural Relation Extraction",
author = "Han, Xu and Gao, Tianyu and Yao, Yuan and Ye, Deming and Liu, Zhiyuan and Sun, Maosong",
booktitle = "Proceedings of EMNLP-IJCNLP: System Demonstrations",
year = "2019",
url = "https://www.aclweb.org/anthology/D19-3029",
doi = "10.18653/v1/D19-3029",
pages = "169--174"
}
This package is mainly contributed by Tianyu Gao, Xu Han, Shulian Cao, Lumin Tang, Yankai Lin, Zhiyuan Liu
About OpenSKL
OpenSKL project aims to harness the power of both structured knowledge and natural languages via representation learning. All sub-projects of OpenSKL, under the categories of Algorithm, Resource and Application, are as follows.
- Algorithm:
- OpenKE
- ERNIE
- An effective and efficient toolkit for augmenting pre-trained language models with knowledge graph representations.
- OpenNE
- An effective and efficient toolkit for representing nodes in large-scale graphs as embeddings, with TADW as key features to incorporate text attributes of nodes.
- OpenNRE
- Resource:
- The embeddings of large-scale knowledge graphs pre-trained by OpenKE, covering three typical large-scale knowledge graphs: Wikidata, Freebase, and XLORE. The embeddings are free to use under the MIT license, and please click the following link to submit download requests.
- OpenKE-Wikidata
- Wikidata is a free and collaborative database, collecting structured data to provide support for Wikipedia. The original Wikidata contains 20,982,733 entities, 594 relations and 68,904,773 triplets. In particular, Wikidata-5M is the core subgraph of Wikidata, containing 5,040,986 high-frequency entities from Wikidata with their corresponding 927 relations and 24,267,796 triplets.
- TransE version: Knowledge embeddings of Wikidata pre-trained by OpenKE.
- TransR version of Wikidata-5M: Knowledge embeddings of Wikidata-5M pre-trained by OpenKE.
- OpenKE-Freebase
- Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources. Freebase contains 86,054,151 entities, 14,824 relations and 338,586,276 triplets.
- TransE version: Knowledge embeddings of Freebase pre-trained by OpenKE.
- OpenKE-XLORE
- XLORE is one of the most popular Chinese knowledge graphs developed by THUKEG. XLORE contains 10,572,209 entities, 138,581 relations and 35,954,249 triplets.
- TransE version: Knowledge embeddings of XLORE pre-trained by OpenKE.
- Application:
- Knowledge-Plugin
- An effective and efficient toolkit of plug-and-play knowledge injection for pre-trained language models. Knowledge-Plugin is general for all kinds of knowledge graph embeddings mentioned above. In the toolkit, we plug the TransR version of Wikidata-5M into BERT as an example of applications. With the TransR embedding, we enhance the knowledge ability of BERT without fine-tuning the original model, e.g., up to 8% improvement on question answering.
- Knowledge-Plugin
Top Related Projects
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot