ltp

Language Technology Platform

5,148

1,056

5,148

View on GitHub

Top Related Projects

HanLP

34,953

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

snownlp

6,541

Python library for processing Chinese text

bert4keras

5,414

keras implement of transformers for humans

Quick Overview

LTP (Language Technology Platform) is an open-source Chinese natural language processing toolkit developed by the Research Center for Social Computing and Information Retrieval at Harbin Institute of Technology. It provides a comprehensive set of Chinese language processing tools, including word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and semantic role labeling.

Pros

Comprehensive suite of Chinese NLP tools in a single package
High accuracy and performance for various Chinese language processing tasks
Actively maintained and regularly updated with new features and improvements
Supports both Python and C++ interfaces for flexibility in integration

Cons

Primarily focused on Chinese language, limiting its use for other languages
Requires some understanding of Chinese linguistics for optimal usage
Documentation is primarily in Chinese, which may be challenging for non-Chinese speakers
Resource-intensive for some tasks, potentially requiring significant computational power

Code Examples

Word segmentation and part-of-speech tagging:

import ltp

ltp = ltp.LTP()
segment, hidden = ltp.seg(["我爱北京天安门"])
pos = ltp.pos(hidden)
print(segment)
print(pos)

Named entity recognition:

import ltp

ltp = ltp.LTP()
segment, hidden = ltp.seg(["华北电力大学位于北京市昌平区"])
ner = ltp.ner(hidden)
print(ner)

Dependency parsing:

import ltp

ltp = ltp.LTP()
segment, hidden = ltp.seg(["他送了一本书给我"])
dep = ltp.dep(hidden)
print(dep)

Getting Started

To get started with LTP, follow these steps:

Install LTP using pip:
```
pip install ltp
```
Import and initialize LTP in your Python script:
```
import ltp
ltp_model = ltp.LTP()
```

Use LTP for various NLP tasks:

text = ["我爱北京天安门"]
segment, hidden = ltp_model.seg(text)
pos = ltp_model.pos(hidden)
ner = ltp_model.ner(hidden)
dep = ltp_model.dep(hidden)
srl = ltp_model.srl(hidden)

For more detailed usage and advanced features, refer to the official documentation on the GitHub repository.

Competitor Comparisons

jieba

34,028

结巴中文分词

Pros of jieba

Lightweight and easy to use, with simple installation and integration
Fast processing speed, especially for large-scale text segmentation tasks
Supports customization of dictionaries and user-defined words

Cons of jieba

Limited functionality compared to ltp, focusing primarily on word segmentation
Less accurate for complex linguistic tasks like named entity recognition or dependency parsing
Fewer options for fine-tuning and model customization

Code Comparison

jieba:

import jieba
seg_list = jieba.cut("我来到北京清华大学", cut_all=False)
print("Default Mode: " + "/ ".join(seg_list))

ltp:

from ltp import LTP
ltp = LTP()
segment, _ = ltp.segment(["我来到北京清华大学"])
print(segment)

Both libraries provide Chinese word segmentation, but ltp offers a more comprehensive set of NLP tools. jieba is simpler to use and faster for basic segmentation tasks, while ltp provides higher accuracy and additional linguistic analysis capabilities. The choice between them depends on the specific requirements of your project, balancing simplicity and speed against accuracy and advanced features.

lac

3,931

百度NLP：分词，词性标注，命名实体识别，词重要性

Pros of LAC

Higher performance and speed for Chinese word segmentation and part-of-speech tagging
Simpler API and easier integration into existing projects
Better support for specialized domains like medicine and finance

Cons of LAC

Limited functionality compared to LTP (focuses mainly on word segmentation and POS tagging)
Less comprehensive documentation and community support
Fewer language options (primarily focused on Chinese)

Code Comparison

LAC:

from LAC import LAC

lac = LAC(mode='lac')
text = "我爱北京天安门"
result = lac.run(text)
print(result)

LTP:

from ltp import LTP

ltp = LTP()
text = "我爱北京天安门"
seg, hidden = ltp.seg([text])
pos = ltp.pos(hidden)
print(seg, pos)

Both repositories provide Chinese natural language processing tools, but they differ in scope and implementation. LAC offers a more streamlined approach for specific tasks, while LTP provides a broader range of NLP functionalities. The choice between them depends on the specific requirements of your project and the depth of NLP analysis needed.

HanLP

34,953

Pros of HanLP

More comprehensive feature set, including advanced NLP tasks like text classification and sentiment analysis
Better documentation and examples, making it easier for new users to get started
More active development and frequent updates

Cons of HanLP

Larger resource footprint, potentially slower for basic tasks
More complex setup and configuration process
May be overkill for simple NLP tasks

Code Comparison

HanLP:

from pyhanlp import *

text = "我爱北京天安门"
print(HanLP.segment(text))

LTP:

from ltp import LTP

ltp = LTP()
seg, _ = ltp.seg([text])
print(seg)

Both libraries offer similar basic functionality for Chinese text segmentation, but HanLP provides a more straightforward API for this task. However, LTP's approach allows for more flexibility in processing multiple sentences at once.

snownlp

6,541

Python library for processing Chinese text

Pros of SnowNLP

Lightweight and easy to use for basic Chinese NLP tasks
Includes sentiment analysis functionality out of the box
Simpler installation process with fewer dependencies

Cons of SnowNLP

Less comprehensive feature set compared to LTP
Lower accuracy for complex NLP tasks
Less active development and community support

Code Comparison

SnowNLP example:

from snownlp import SnowNLP

s = SnowNLP(u'这个东西真心很赞')
print(s.words)         # [u'这个', u'东西', u'真心', u'很', u'赞']
print(s.tags)          # [(u'这个', u'r'), (u'东西', u'n'), (u'真心', u'd'), (u'很', u'd'), (u'赞', u'Vg')]
print(s.sentiments)    # 0.9769663402895832 # Positive sentiment

LTP example:

from ltp import LTP

ltp = LTP()
seg, hidden = ltp.seg(["这个东西真心很赞"])
pos = ltp.pos(hidden)
ner = ltp.ner(hidden)
dep = ltp.dep(hidden)
srl = ltp.srl(hidden)

print(seg)
print(pos)
print(ner)
print(dep)
print(srl)

bert4keras

5,414

keras implement of transformers for humans

Pros of bert4keras

Focused specifically on BERT and related models, offering more specialized functionality
Simpler API and easier to use for BERT-based tasks
More actively maintained with frequent updates

Cons of bert4keras

Limited to BERT and related models, less versatile for other NLP tasks
Smaller community and fewer resources compared to LTP
May require more manual configuration for complex tasks

Code Comparison

bert4keras:

from bert4keras.tokenizers import Tokenizer
from bert4keras.models import build_transformer_model

tokenizer = Tokenizer(dict_path)
model = build_transformer_model(config_path, checkpoint_path)

LTP:

from ltp import LTP

ltp = LTP()
seg, hidden = ltp.seg(["我爱北京天安门"])
pos = ltp.pos(hidden)

Summary

bert4keras is more specialized for BERT-related tasks with a simpler API, while LTP offers a broader range of NLP functionalities. bert4keras may be preferable for BERT-specific projects, while LTP is more suitable for general Chinese NLP tasks. The choice depends on the specific requirements of your project and the level of customization needed.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CODE SIZE CONTRIBUTORS LAST COMMIT

Language	version
Python
Rust

LTP 4

å¼ç¨

å¦ææ¨å¨å·¥ä½ä¸ä½¿ç¨äº LTPï¼æ¨å¯ä»¥å¼ç¨è¿ç¯è®ºæ

@inproceedings{che-etal-2021-n,
    title = "N-{LTP}: An Open-source Neural Language Technology Platform for {C}hinese",
    author = "Che, Wanxiang  and
      Feng, Yunlong  and
      Qin, Libo  and
      Liu, Ting",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.6",
    doi = "10.18653/v1/2021.emnlp-demo.6",
    pages = "42--49",
    abstract = "We introduce N-LTP, an open-source neural language technology platform supporting six fundamental Chinese NLP tasks: lexical analysis (Chinese word segmentation, part-of-speech tagging, and named entity recognition), syntactic parsing (dependency parsing), and semantic parsing (semantic dependency parsing and semantic role labeling). Unlike the existing state-of-the-art toolkits, such as Stanza, that adopt an independent model for each task, N-LTP adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks. In addition, a knowledge distillation method (Clark et al., 2019) where the single-task model teaches the multi-task model is further introduced to encourage the multi-task model to surpass its single-task teacher. Finally, we provide a collection of easy-to-use APIs and a visualization tool to make users to use and view the processing results more easily and directly. To the best of our knowledge, this is the first toolkit to support six Chinese NLP fundamental tasks. Source code, documentation, and pre-trained models are available at https://github.com/HIT-SCIR/ltp.",
}

æ´æ°è¯´æ

4.2.0
- [ç»ææ§åå] å° LTP æåæ 2 ä¸ªé¨åï¼ç»´æ¤åè®ç»æ´æ¹ä¾¿ï¼ç»ææ´æ¸æ°
  - [Legacy æ¨¡å] éå¯¹å¹¿å¤§ç¨æ·å¯¹äº**æ¨çéåº¦**çéæ±ï¼ä½¿ç¨ Rust éåäºåºäºæç¥æºçç®æ³ï¼åç¡®çä¸ LTP3 çæ¬ç¸å½ï¼éåº¦åæ¯ LTP v3 ç 3.55 åï¼å¼å¯å¤çº¿ç¨æ´å¯è·å¾ 17.17 åçéåº¦æåï¼ä½ç®åä»æ¯æåè¯ãè¯æ§ãå½åå®ä½ä¸å¤§ä»»å¡
  - [æ·±åº¦å¦ä¹ æ¨¡å] å³åºäº PyTorch å®ç°çæ·±åº¦å¦ä¹ æ¨¡åï¼æ¯æå¨é¨ç 6 å¤§ä»»å¡ï¼åè¯/è¯æ§/å½åå®ä½/è¯ä¹è§è²/ä¾åå¥æ³/è¯ä¹ä¾åï¼
- [å¶ä»æ¹è¿] æ¹è¿äºæ¨¡åè®ç»æ¹æ³
  - [å±å] æä¾äºè®ç»èæ¬åè®ç»æ ·ä¾ï¼ä½¿å¾ç¨æ·è½å¤æ´æ¹ä¾¿å°ä½¿ç¨ç§æçæ°æ®ï¼èªè¡è®ç»ä¸ªæ§åçæ¨¡å
  - [æ·±åº¦å¦ä¹ æ¨¡å] éç¨ hydra å¯¹è®ç»è¿ç¨è¿è¡éç½®ï¼æ¹ä¾¿å¹¿å¤§ç¨æ·ä¿®æ¹æ¨¡åè®ç»åæ°ä»¥åå¯¹ LTP è¿è¡æ©å±ï¼æ¯å¦ä½¿ç¨å¶ä»åä¸ç Moduleï¼
- [å¶ä»åå] åè¯ãä¾åå¥æ³åæ (Eisner) å è¯ä¹ä¾ååæ (Eisner) ä»»å¡çè§£ç ç®æ³ä½¿ç¨ Rust å®ç°ï¼éåº¦æ´å¿«
- [æ°ç¹æ§] æ¨¡åä¸ä¼ è³ Huggingface Hubï¼æ¯æèªå¨ä¸è½½ï¼ä¸è½½éåº¦æ´å¿«ï¼å¹¶ä¸æ¯æç¨æ·èªè¡ä¸ä¼ èªå·±è®ç»çæ¨¡åä¾ LTP è¿è¡æ¨çä½¿ç¨
- [ç ´åæ§åæ´] æ¹ç¨ Pipeline API è¿è¡æ¨çï¼æ¹ä¾¿åç»è¿è¡æ´æ·±å¥çæ§è½ä¼åï¼å¦ SDP å SDPG å¾å¤§ä¸é¨åæ¯éå çï¼éç¨å¯ä»¥å å¿«æ¨çéåº¦ï¼ï¼ä½¿ç¨è¯´æåè§Github å¿«éä½¿ç¨é¨å
4.1.0
- æä¾äºèªå®ä¹åè¯çåè½
- ä¿®å¤äºä¸äº bug
4.0.0
- åºäº Pytorch å¼åï¼åç Python æ¥å£
- å¯æ ¹æ®éè¦èªç±éæ©ä¸åéåº¦åææ çæ¨¡å
- åè¯ãè¯æ§ãå½åå®ä½ãä¾åå¥æ³ãè¯ä¹è§è²ãè¯ä¹ä¾å 6 å¤§ä»»å¡

å¿«éä½¿ç¨

Python

# æ¹æ³ 1ï¼ ä½¿ç¨æ¸åæºå®è£ LTP
# 1. å®è£ PyTorch å Transformers ä¾èµ
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch transformers
# 2. å®è£ LTP
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple ltp ltp-core ltp-extension

# æ¹æ³ 2ï¼ åå¨å±æ¢æºï¼åå®è£ LTP
# 1. å¨å±æ¢ TUNA æº
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# 2. å®è£ PyTorch å Transformers ä¾èµ
pip install torch transformers
# 3. å®è£ LTP
pip install ltp ltp-core ltp-extension

import torch
from ltp import LTP

# é»è®¤ huggingface ä¸è½½ï¼å¯è½éè¦ä»£ç

ltp = LTP("LTP/small")  # é»è®¤å è½½ Small æ¨¡å
                        # ä¹å¯ä»¥ä¼ å¥æ¨¡åçè·¯å¾ï¼ltp = LTP("/path/to/your/model")
                        # /path/to/your/model åºå½åå¨ config.json åå¶ä»æ¨¡åæä»¶

# å°æ¨¡åç§»å¨å° GPU ä¸
if torch.cuda.is_available():
    # ltp.cuda()
    ltp.to("cuda")

# èªå®ä¹è¯è¡¨
ltp.add_word("æ±¤å§å»", freq=2)
ltp.add_words(["å¤å¥", "å¤è¡£"], freq=2)

#  åè¯ cwsãè¯æ§ posãå½åå®ä½æ æ³¨ nerãè¯ä¹è§è²æ æ³¨ srlãä¾åå¥æ³åæ depãè¯ä¹ä¾ååææ  sdpãè¯ä¹ä¾ååæå¾ sdpg
output = ltp.pipeline(["ä»å«æ±¤å§å»æ¿å¤è¡£ã"], tasks=["cws", "pos", "ner", "srl", "dep", "sdp", "sdpg"])
# ä½¿ç¨åå¸æ ¼å¼ä½ä¸ºè¿åç»æ
print(output.cws)  # print(output[0]) / print(output['cws']) # ä¹å¯ä»¥ä½¿ç¨ä¸æ è®¿é®
print(output.pos)
print(output.sdp)

# ä½¿ç¨æç¥æºç®æ³å®ç°çåè¯ãè¯æ§åå½åå®ä½è¯å«ï¼éåº¦æ¯è¾å¿«ï¼ä½æ¯ç²¾åº¦ç¥ä½
ltp = LTP("LTP/legacy")
# cws, pos, ner = ltp.pipeline(["ä»å«æ±¤å§å»æ¿å¤è¡£ã"], tasks=["cws", "ner"]).to_tuple() # error: NER éè¦ è¯æ§æ æ³¨ä»»å¡çç»æ
cws, pos, ner = ltp.pipeline(["ä»å«æ±¤å§å»æ¿å¤è¡£ã"], tasks=["cws", "pos", "ner"]).to_tuple()  # to tuple å¯ä»¥èªå¨è½¬æ¢ä¸ºåç»æ ¼å¼
# ä½¿ç¨åç»æ ¼å¼ä½ä¸ºè¿åç»æ
print(cws, pos, ner)

è¯¦ç»è¯´æ

Rust

use std::fs::File;
use itertools::multizip;
use ltp::{CWSModel, POSModel, NERModel, ModelSerde, Format, Codec};

fn main() -> Result<(), Box<dyn std::error::Error>> {
  let file = File::open("data/legacy-models/cws_model.bin")?;
  let cws: CWSModel = ModelSerde::load(file, Format::AVRO(Codec::Deflate))?;
  let file = File::open("data/legacy-models/pos_model.bin")?;
  let pos: POSModel = ModelSerde::load(file, Format::AVRO(Codec::Deflate))?;
  let file = File::open("data/legacy-models/ner_model.bin")?;
  let ner: NERModel = ModelSerde::load(file, Format::AVRO(Codec::Deflate))?;

  let words = cws.predict("ä»å«æ±¤å§å»æ¿å¤è¡£ã")?;
  let pos = pos.predict(&words)?;
  let ner = ner.predict((&words, &pos))?;

  for (w, p, n) in multizip((words, pos, ner)) {
    println!("{}/{}/{}", w, p, n);
  }

  Ok(())
}

æ¨¡åæ§è½ä»¥åä¸è½½å°å

æ·±åº¦å¦ä¹ æ¨¡å(ð¤HF/ð¤HF-mirror)	åè¯	è¯æ§	å½åå®ä½	è¯ä¹è§è²	ä¾åå¥æ³	è¯ä¹ä¾å	éåº¦(å¥/S)
ð¤Base ð¤Base-mirror	98.7	98.5	95.4	80.6	89.5	75.2	39.12
ð¤Base1 ð¤Base1-mirror	99.22	98.73	96.39	79.28	89.57	76.57	--.--
ð¤Base2 ð¤Base2-mirror	99.18	98.69	95.97	79.49	90.19	76.62	--.--
ð¤Small ð¤Small-mirror	98.4	98.2	94.3	78.4	88.3	74.7	43.13
ð¤Tiny ð¤Tiny-mirror	96.8	97.1	91.6	70.9	83.8	70.1	53.22

æç¥æºç®æ³æ¨¡å(ð¤HF/ð¤HF-mirror)	åè¯	è¯æ§	å½åå®ä½	éåº¦(å¥/s)	å¤æ³¨
ð¤Legacy ð¤Legacy-mirror	97.93	98.41	94.28	21581.48	æ§è½è¯¦æ

æ³¨ï¼æç¥æºç®æ³éåº¦ä¸ºå¼å¯ 16 çº¿ç¨éåº¦

å¦ä½ä¸è½½å¯¹åºçæ¨¡å

# ä½¿ç¨ HTTP é¾æ¥ä¸è½½
# ç¡®ä¿å·²å®è£ git-lfs (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/LTP/base

# ä½¿ç¨ ssh ä¸è½½
# ç¡®ä¿å·²å®è£ git-lfs (https://git-lfs.com)
git lfs install
git clone git@hf.co:LTP/base

# ä¸è½½åç¼©å
wget http://39.96.43.154/ltp/v4/base.tgz
tar -zxvf base.tgz -C base

å¦ä½ä½¿ç¨ä¸è½½çæ¨¡å

from ltp import LTP

# å¨è·¯å¾ä¸ç»åºæ¨¡åä¸è½½æè§£ååçè·¯å¾
# ä¾å¦ï¼base æ¨¡åçæä»¶å¤¹è·¯å¾ä¸º "path/to/base"
#      "path/to/base" ä¸åºå½åå¨ "config.json"
ltp = LTP("path/to/base")

æå»º Wheel å

make bdist

å¶ä»è¯è¨ç»å®

æç¥æºç®æ³

Rust
C/C++

æ·±åº¦å¦ä¹ ç®æ³

ä½èä¿¡æ¯

è½¦ä¸ç¿ <<car@ir.hit.edu.cn>>
å¯äºé¾ <<ylfeng@ir.hit.edu.cn>>

å¼æºåè®®

è¯è¨ææ¯å¹³å°é¢åå½åå¤å¤§å¦ãä¸ç§é¢åç ç©¶æä»¥åä¸ªäººç ç©¶èåè´¹å¼æ¾æºä»£ç ï¼ä½å¦ä¸è¿°æºæåä¸ªäººå°è¯¥å¹³å°ç¨äºåä¸ç®çï¼å¦ä¼ä¸åä½é¡¹ç®çï¼åéè¦ä»è´¹ã
é¤ä¸è¿°æºæä»¥å¤çä¼äºä¸åä½ï¼å¦ç³è¯·ä½¿ç¨è¯¥å¹³å°ï¼éä»è´¹ã
å¡æ¶åä»è´¹é®é¢ï¼è¯·åé®ä»¶å° car@ir.hit.edu.cn æ´½åã
å¦ææ¨å¨ LTP åºç¡ä¸åè¡¨è®ºææåå¾ç§ç ææï¼è¯·æ¨å¨åè¡¨è®ºæåç³æ¥æææ¶å£°æâä½¿ç¨äºåå·¥å¤§ç¤¾ä¼è®¡ç®ä¸ä¿¡æ¯æ£ç´¢ç ç©¶ä¸å¿ç å¶çè¯è¨ææ¯å¹³å°ï¼LTPï¼â. åæ¶ï¼åä¿¡ç»car@ir.hit.edu.cnï¼è¯´æåè¡¨è®ºææç³æ¥ææçé¢ç®ãåºå¤çã

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of jieba

Cons of jieba

Code Comparison

Pros of LAC

Cons of LAC

Code Comparison

Pros of HanLP

Cons of HanLP

Code Comparison

Pros of SnowNLP

Cons of SnowNLP

Code Comparison

Pros of bert4keras

Cons of bert4keras

Code Comparison

Summary

Convert designs to code with AI

README

LTP 4

å¼ç¨

æ´æ°è¯´æ

å¿«éä½¿ç¨

æ¨¡åæ§è½ä»¥åä¸è½½å°å

å¦ä½ä¸è½½å¯¹åºçæ¨¡å

å¦ä½ä½¿ç¨ä¸è½½çæ¨¡å

æå»º Wheel å

å ¶ä»è¯­è¨ç»å®

ä½è ä¿¡æ¯

å¼æºåè®®

Top Related Projects

Convert designs to code with AI

å¼ç¨

æ´æ°è¯´æ

å¿«éä½¿ç¨

æ¨¡åæ§è½ä»¥åä¸è½½å°å

å¦ä½ä¸è½½å¯¹åºçæ¨¡å

å¦ä½ä½¿ç¨ä¸è½½çæ¨¡å

æå»º Wheel å

å¶ä»è¯è¨ç»å®

ä½èä¿¡æ¯

å¼æºåè®®