Awesome-Chinese-NLP

A curated list of resources for Chinese NLP 中文自然语言处理相关资料

7,896

1,718

7,896

View on GitHub

Top Related Projects

HanLP

35,454

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

snownlp

6,568

Python library for processing Chinese text

bert4keras

5,414

keras implement of transformers for humans

Quick Overview

Awesome-Chinese-NLP is a curated list of resources for Chinese Natural Language Processing (NLP). It provides a comprehensive collection of tools, datasets, papers, and other materials specifically focused on NLP tasks for the Chinese language. This repository serves as a valuable reference for researchers, developers, and enthusiasts working on Chinese language processing.

Pros

Extensive collection of resources covering various aspects of Chinese NLP
Regularly updated with new tools, datasets, and research papers
Well-organized structure, making it easy to find specific resources
Includes both open-source and commercial tools, providing a broad overview of the field

Cons

May be overwhelming for beginners due to the large number of resources
Some links may become outdated over time if not regularly maintained
Lacks detailed explanations or comparisons of the listed resources
Primarily in English, which may be a barrier for some Chinese-speaking users

Code Examples

This repository is not a code library but a curated list of resources. Therefore, there are no code examples to provide.

Getting Started

As this is not a code library, there are no specific getting started instructions. However, users can begin by exploring the repository's README file on GitHub, which provides an organized list of resources categorized by different aspects of Chinese NLP, such as:

Chinese Word Segmentation
Named Entity Recognition
Sentiment Analysis
Machine Translation
Information Extraction
Text Summarization
Datasets
Toolkits

Users can click on the links provided in each category to access the relevant resources, tools, or papers.

Competitor Comparisons

lac

3,957

百度NLP：分词，词性标注，命名实体识别，词重要性

Pros of lac

Focused, production-ready Chinese NLP toolkit
Provides pre-trained models for immediate use
Optimized for performance and efficiency

Cons of lac

Limited scope compared to Awesome-Chinese-NLP's comprehensive resource list
Less frequently updated than Awesome-Chinese-NLP
Primarily maintained by a single organization (Baidu)

Code comparison

lac:

from LAC import LAC

lac = LAC(mode='lac')
text = "我爱北京天安门"
result = lac.run(text)
print(result)

Awesome-Chinese-NLP doesn't provide direct code examples but offers links to various tools and libraries. A typical usage might involve selecting a specific tool from the list and implementing it separately.

Summary

lac is a focused, ready-to-use Chinese NLP toolkit optimized for performance, while Awesome-Chinese-NLP serves as a comprehensive resource list for Chinese NLP tools and research. lac offers immediate functionality but has a narrower scope, whereas Awesome-Chinese-NLP provides a broader overview of available resources but requires additional effort to implement specific tools.

jieba

34,296

结巴中文分词

Pros of jieba

Focused, specialized tool for Chinese word segmentation
Lightweight and easy to integrate into projects
Offers multiple segmentation modes (accurate, full, search engine)

Cons of jieba

Limited to word segmentation, not a comprehensive NLP toolkit
May require additional libraries for advanced NLP tasks
Less frequently updated compared to Awesome-Chinese-NLP

Code Comparison

Awesome-Chinese-NLP is a curated list of resources, not a code library. However, here's a basic usage example of jieba:

import jieba

text = "我来到北京清华大学"
seg_list = jieba.cut(text, cut_all=False)
print("Default Mode: " + "/ ".join(seg_list))

Summary

jieba is a specialized Chinese word segmentation tool, offering efficient and accurate text processing for specific tasks. It's lightweight and easy to use but limited in scope compared to the comprehensive resource list provided by Awesome-Chinese-NLP.

Awesome-Chinese-NLP serves as a curated collection of various Chinese NLP tools, datasets, and research papers, providing a broader overview of the field. While it doesn't offer direct functionality, it guides users to a wide range of resources for different NLP tasks.

Choose jieba for quick integration of Chinese word segmentation into your project. Opt for Awesome-Chinese-NLP when seeking a comprehensive guide to Chinese NLP resources and tools for more complex or diverse NLP tasks.

HanLP

35,454

Pros of HanLP

Comprehensive NLP toolkit with a wide range of functionalities
Actively maintained with regular updates and improvements
Provides both Java and Python interfaces for flexibility

Cons of HanLP

Steeper learning curve due to its extensive feature set
May be overkill for simple NLP tasks or projects
Requires more system resources compared to lightweight alternatives

Code Comparison

HanLP:

from hanlp_restful import HanLP

HanLP.parse('我爱自然语言处理技术！')

Awesome-Chinese-NLP (using jieba as an example):

import jieba

jieba.cut('我爱自然语言处理技术！')

Summary

HanLP is a comprehensive NLP toolkit offering a wide range of functionalities for Chinese language processing. It provides both Java and Python interfaces, making it versatile for different development environments. However, its extensive feature set may result in a steeper learning curve and higher resource requirements.

Awesome-Chinese-NLP, on the other hand, is a curated list of resources and tools for Chinese NLP. It doesn't provide direct functionality but serves as a valuable reference for various Chinese NLP tools and libraries. This makes it more suitable for developers looking to explore different options or find specific tools for their projects.

While HanLP offers a unified solution for many NLP tasks, Awesome-Chinese-NLP allows users to pick and choose from a variety of specialized tools, potentially resulting in a more tailored and lightweight solution for specific use cases.

snownlp

6,568

Python library for processing Chinese text

Pros of snownlp

Focused tool: Provides a specific set of Chinese NLP functionalities
Ready-to-use: Offers pre-trained models for immediate application
Lightweight: Easy to install and integrate into projects

Cons of snownlp

Limited scope: Covers fewer NLP tasks compared to Awesome-Chinese-NLP
Less frequently updated: May not include the latest advancements in Chinese NLP
Smaller community: Less active development and support

Code comparison

snownlp:

from snownlp import SnowNLP

s = SnowNLP(u'这是一个测试句子')
print(s.words)         # 分词
print(s.tags)          # 词性标注
print(s.sentiments)    # 情感分析

Awesome-Chinese-NLP: (Note: This is a curated list, not a tool, so there's no direct code comparison)

Summary

snownlp is a practical, ready-to-use Chinese NLP library with a focused set of features. It's suitable for quick implementation of basic Chinese NLP tasks. Awesome-Chinese-NLP, on the other hand, is a comprehensive resource list that provides a wider range of tools and research papers for Chinese NLP. It's more suitable for researchers and developers looking to explore various options and stay updated with the latest advancements in the field.

bert4keras

5,414

keras implement of transformers for humans

Pros of bert4keras

Focused specifically on BERT implementation in Keras
Provides ready-to-use BERT models for Chinese NLP tasks
Offers more hands-on code examples and implementations

Cons of bert4keras

Limited to BERT-based models and Keras framework
Less comprehensive in covering other Chinese NLP resources
May require more technical expertise to use effectively

Code Comparison

bert4keras example:

from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import Tokenizer

model = build_transformer_model(config_path, checkpoint_path)
tokenizer = Tokenizer(dict_path)

Awesome-Chinese-NLP doesn't provide direct code examples but offers links to various Chinese NLP tools and resources:

## Chinese Word Segmentation

- [THULAC](http://thulac.thunlp.org/) - An Efficient Lexical Analyzer for Chinese
- [Jieba](https://github.com/fxsjy/jieba) - Python Chinese Word Segmentation Module

While Awesome-Chinese-NLP serves as a comprehensive resource hub for Chinese NLP, bert4keras focuses on providing a specific implementation of BERT for Chinese language tasks. Awesome-Chinese-NLP covers a broader range of topics and tools, making it more suitable for researchers and developers looking for an overview of the field. bert4keras, on the other hand, is more appropriate for those specifically interested in using BERT models with Keras for Chinese NLP projects.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

awesome-chinese-nlp

A curated list of resources for NLP (Natural Language Processing) for Chinese

ä¸æèªç¶è¯è¨å¤çç¸å³èµæ

å¾çæ¥èªå¤æ¦å¤§å¦é±é¡é¹ææ

Chinese NLP Toolkits ä¸æNLPå·¥å·

Toolkits ç»¼åNLPå·¥å·å

THULAC ä¸æè¯æ³åæå·¥å·å by æ¸å (C++/Java/Python)
NLPIR by ä¸ç§é¢ (Java)
LTP è¯è¨ææ¯å¹³å° by åå·¥å¤§ (C++) pylyp LTPçpythonå°è£
FudanNLP by å¤æ¦ (Java)
BaiduLac by ç¾åº¦ Baidu's open-source lexical analysis tool for Chinese, including word segmentation, part-of-speech tagging & named entity recognition.
HanLP (Java)
FastNLP (Python) ä¸æ¬¾è½»éçº§ç NLP å¤çå¥ä»¶ã
SnowNLP (Python) Python library for processing Chinese text
YaYaNLP (Python) çº¯pythonç¼åçä¸æèªç¶è¯è¨å¤çåï¼ååäºâççå¦è¯â
å°æNLP (Python) è½»éçº§ä¸æèªç¶è¯è¨å¤çå·¥å·
DeepNLP (Python) Deep Learning NLP Pipeline implemented on Tensorflow with pretrained Chinese models.
chinese_nlp (C++ & Python) Chinese Natural Language Processing tools and examples
lightNLP (Python) åºäºPytorchåtorchtextçèªç¶è¯è¨å¤çæ·±åº¦å¦ä¹ æ¡æ¶
Chinese-Annotator (Python) Annotator for Chinese Text Corpus ä¸æææ¬æ æ³¨å·¥å·
Poplar (Typescript) A web-based annotation tool for natural language processing (NLP)
Jiagu (Python) Jiaguä»¥BiLSTMçæ¨¡åä¸ºåºç¡ï¼ä½¿ç¨å¤§è§æ¨¡è¯æè®ç»èæãå°æä¾ä¸æåè¯ãè¯æ§æ æ³¨ãå½åå®ä½è¯å«ãææåæãç¥è¯å¾è°±å³ç³»æ½åãå³é®è¯æ½åãææ¬æè¦ãæ°è¯åç°çå¸¸ç¨èªç¶è¯è¨å¤çåè½ã
SmoothNLP (Python & Java) ä¸æ³¨äºå¯è§£éçNLPææ¯
FoolNLTK (Python & Java) A Chinese Nature Language Toolkit

Popular NLP Toolkits for English/Multi-Language å¸¸ç¨çè±æææ¯æå¤è¯è¨çNLPå·¥å·å

CoreNLP by Stanford (Java) A Java suite of core NLP tools.
Stanza by Stanford (Python) A Python NLP Library for Many Human Languages
NLTK (Python) Natural Language Toolkit
spaCy (Python) Industrial-Strength Natural Language Processing with a online course
textacy (Python) NLP, before and after spaCy
OpenNLP (Java) A machine learning based toolkit for the processing of natural language text.
gensim (Python) Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora.
Kashgari - Simple and powerful NLP framework, build your state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT and word2vec embedding.

Chinese Word Segment ä¸æåè¯

Jieba ç»å·´ä¸æåè¯ (Pythonåå¤§éå¶å®ç¼ç¨è¯è¨è¡ç) åæå¥½ç Python ä¸æåè¯ç»ä»¶
åå¤§ä¸æåè¯å·¥å· (Python) é«åç¡®åº¦ä¸æåè¯å·¥å·ï¼ç®åæç¨ï¼è·ç°æå¼æºå·¥å·ç¸æ¯å¤§å¹æé«äºåè¯çåç¡®çã
kcws æ·±åº¦å¦ä¹ ä¸æåè¯ (Python) BiLSTM+CRFä¸IDCNN+CRF
ID-CNN-CWS (Python) Iterated Dilated Convolutions for Chinese Word Segmentation
Genius ä¸æåè¯ (Python) Geniusæ¯ä¸ä¸ªå¼æºçpythonä¸æåè¯ç»ä»¶ï¼éç¨ CRF(Conditional Random Field)æ¡ä»¶éæºåºç®æ³ã
loso ä¸æåè¯ (Python)
yaha "åå"ä¸æåè¯ (Python)
ChineseWordSegmentation (Python) Chinese word segmentation algorithm without corpusï¼æ éè¯æåºçä¸æåè¯ï¼
Goè¯è¨é«æ§è½åè¯ (Go) Go efficient text segmentation; support english, chinese, japanese and other.
Ansjä¸æåè¯ (java) åºäºn-Gram+CRF+HMMçä¸æåè¯çjavaå®ç°

Information Extraction ä¿¡æ¯æå

MITIE (C++) library and tools for information extraction
Duckling (Haskell) Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
IEPY (Python) IEPY is an open source tool for Information Extraction focused on Relation Extraction.
Snorkel A training data creation and management system focused on information extraction
Neural Relation Extraction implemented with LSTM in TensorFlow
A neural network model for Chinese named entity recognition
bert-chinese-ner ä½¿ç¨é¢è®ç»è¯è¨æ¨¡åBERTåä¸æNER
Information-Extraction-Chinese Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT ä¸æå®ä½è¯å«ä¸å³ç³»æå
Familia ç¾åº¦åºåç A Toolkit for Industrial Topic Modeling
Text Classification All kinds of text classificaiton models and more with deep learning. ç¨ç¥ä¹é®çè¯èä½ä¸ºæµè¯æ°æ®ã
ComplexEventExtraction ä¸æå¤åäºä»¶çæ¦å¿µä¸æ¾å¼æ¨¡å¼ï¼åæ¬æ¡ä»¶äºä»¶ãå æäºä»¶ãé¡ºæ¿äºä»¶ãåè½¬äºä»¶çäºä»¶æ½åï¼å¹¶å½¢æäºçå¾è°±ã
TextRank4ZH ä»ä¸æææ¬ä¸èªå¨æåå³é®è¯åæè¦

Rasa NLU (Python) turn natural language into structured data, a Chinese fork at Rasa NLU Chi
Rasa Core (Python) machine learning based dialogue engine for conversational software
Chatstack A Full Pipeline UI for building Chinese NLU System
Snips NLU (Python) Snips NLU is a Python library that allows to parse sentences written in natural language and extracts structured information.
DeepPavlov (Python) An open source library for building end-to-end dialog systems and training chatbots.
ChatScript Natural Language tool/dialog manager, a rule-based chatbot engine.
Chatterbot (Python) ChatterBot is a machine learning, conversational dialog engine for creating chat bots.
QuestionAnsweringSystem (Java) ä¸ä¸ªJavaå®ç°çäººæºé®çç³»ç»ï¼è½å¤èªå¨åæé®é¢å¹¶ç»åºåéçæ¡ã
QA-Snake (Python) åºäºå¤æç´¢å¼æåæ·±åº¦å¦ä¹ ææ¯çèªå¨é®ç
ä½¿ç¨æ·±åº¦å¦ä¹ ç®æ³å®ç°çä¸æéè¯»çè§£é®çç³»ç» (Python)
AnyQ by Baidu ä¸»è¦åå«é¢åFAQéåçé®çç³»ç»æ¡æ¶ãææ¬è¯ä¹å¹éå·¥å·SimNetã
DuReaderä¸æéè¯»çè§£Baselineä»£ç (Python)
åºäºSmartQQçèªå¨æºå¨äººæ¡æ¶ (Python)
QASystemOnMedicalKG (Python) ä»¥ç¾çä¸ºä¸å¿çä¸å®è§æ¨¡å»è¯é¢åç¥è¯å¾è°±ï¼å¹¶ä»¥è¯¥ç¥è¯å¾è°±å®æèªå¨é®çä¸åææå¡ã
GPT2-chitchat (Python) ç¨äºä¸æé²èçGPT2æ¨¡å
CDial-GPT (Python) æä¾äºä¸ä¸ªå¤§è§æ¨¡ä¸æå¯¹è¯æ°æ®éï¼å¹¶æä¾äºå¨æ¤æ°æ®éä¸çä¸æå¯¹è¯é¢è®ç»æ¨¡åï¼ä¸æGPTæ¨¡åï¼

Multi-Modal Representation & Retrieval å¤æ¨¡æè¡¨å¾ä¸æ£ç´¢

Chinese-CLIP (Python) Chinese-CLIPæ¯ä¸æå¤æ¨¡æå¾æè¡¨å¾é¢è®ç»æ¨¡åãå¶åºäºOpenAIçCLIPæ¨¡åç»æï¼å©ç¨å¤§è§æ¨¡ä¸æåçå¾æè¯æå®æé¢è®ç»ï¼ç®åå¼æºäºå¤ä¸ªæ¨¡åè§æ¨¡ï¼åæ¶å¬å¼äºææ¯æ¥åè®ºæåæ£ç´¢demo

Corpus ä¸æè¯æ

å¼æ¾ç¥è¯å¾è°±OpenKG.cn
å¼æ¾ä¸æç¥è¯å¾è°±çschema
å¤§è§æ¨¡ä¸ææ¦å¿µå¾è°±CN-Probase å¬ä¼å·ä»ç»
å¤§è§æ¨¡1.4äº¿ä¸æç¥è¯å¾è°±å¼æºä¸è½½
åä¸ç¥è¯å¾è°± åä¸é¢åçä¿¡æ¯æ£ç´¢ï¼å½åå®ä½è¯å«ï¼å³ç³»æ½åï¼åç±»æ æå»ºï¼æ°æ®ææ
CLDCä¸æè¯è¨èµæºèç
ä¸æ Wikipedia Dump
åºäºä¸åè¯æãä¸åæ¨¡åï¼æ¯å¦BERTãGPTï¼çä¸æé¢è®ç»æ¨¡å ä¸æé¢è®ç»æ¨¡åæ¡æ¶ï¼æ¯æä¸åè¯æãç¼ç å¨ãç®æ ä»»å¡çé¢è®ç»æ¨¡åï¼from RUC and Tencentï¼
OpenCLaP å¤é¢åå¼æºä¸æé¢è®ç»è¯è¨æ¨¡åä»åº (from Tsinghua)
98å¹´äººæ°æ¥æ¥è¯æ§æ æ³¨åº@ç¾åº¦ç
æç20061127æ°é»è¯æ(åå«åç±»)@ç¾åº¦ç
UDChinese (for training spaCy POS)
ä¸æword2vecæ¨¡å
ä¸ç¾ç§é¢è®ç»ä¸æè¯åé
Tencent AI Lab Embedding Corpus for Chinese Words and Phrases
ä¸æé¢è®ç»BERT with Whole Word Masking
ä¸æGPT2è®ç»ä»£ç å¯ä»¥åè¯ï¼æ°é»ï¼å°è¯´ï¼ææ¯è®ç»éç¨è¯è¨æ¨¡åã
ä¸æè¯è¨çè§£æµè¯åºåChineseGLUE åæ¬ä»£è¡¨æ§çæ°æ®éãåºå(é¢è®ç»)æ¨¡åãè¯æåºãæè¡æ¦ã
ä¸åæ°ååå¸æ°æ®åº åæ¬æåè¯ï¼æè¯ï¼è¯è¯ï¼æ±åã
Synonyms:ä¸æè¿ä¹è¯å·¥å·å åºäºç»´åºç¾ç§ä¸æåword2vecè®ç»çè¿ä¹è¯åºï¼å°è£ä¸ºpythonåæä»¶ã
Chinese_conversation_sentiment A Chinese sentiment dataset may be useful for sentiment analysis.
ä¸æçªåäºä»¶è¯æåº Chinese Emergency Corpus
dgk_lost_conv ä¸æå¯¹ç½è¯æ chinese conversation corpus
ç¨äºè®ç»ä¸è±æå¯¹è¯ç³»ç»çè¯æåº Datasets for Training Chatbot System
å«å¦çåçä¸æèªæ
ä¸å½è¡å¸å¬åä¿¡æ¯ç¬å éè¿pythonèæ¬ä»å·¨æ½®ç½ç»çæå¡å¨è·åä¸å½è¡å¸ï¼sz,shï¼çå¬å(ä¸å¸å¬å¸åçç®¡æºæ)
tushareè´¢ç»æ°æ®æ¥å£ TuShareæ¯ä¸ä¸ªåè´¹ãå¼æºçpythonè´¢ç»æ°æ®æ¥å£åã
éèææ¬æ°æ®é SmoothNLP éèææ¬æ°æ®é(å¬å¼) Public Financial Datasets for NLP Researches
æå¨ä¸åå¤è¯è¯æ°æ®åº åå®ä¸¤æè¿ä¸ä¸ååå¤è¯äºº, æ¥è¿5.5ä¸é¦åè¯å 26ä¸å®è¯. ä¸¤å®æ¶æ1564ä½è¯äººï¼21050é¦è¯ã
DuReaderä¸æéè¯»çè§£æ°æ®
ä¸æè¯æå°æ°æ® åå«äºä¸æå½åå®ä½è¯å«ãä¸æå³ç³»è¯å«ãä¸æéè¯»çè§£çä¸äºå°éæ°æ®
Chinese-Literature-NER-RE-Dataset A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
ChineseTextualInference ä¸æææ¬æ¨æé¡¹ç®,åæ¬88ä¸ææ¬è´å«ä¸æææ¬è´å«æ°æ®éçç¿»è¯ä¸æå»º,åºäºæ·±åº¦å¦ä¹ çææ¬è´å«å¤å®æ¨¡åæå»º.
å¤§è§æ¨¡ä¸æèªç¶è¯è¨å¤çè¯æ ç»´åºç¾ç§(wiki2019zh),æ°é»è¯æ(news2016zh),ç¾ç§é®ç(baike2018qa)
ä¸æäººåè¯æåº ä¸æå§å,å§æ°,åå,ç§°å¼,æ¥æ¬äººå,ç¿»è¯äººå,è±æäººåã
ä¸æææè¯è¯åº ææè¯è¿æ»¤çå ç§å®ç°+æ1wè¯ææè¯åº
ä¸æç®ç§°è¯åº A corpus of Chinese abbreviation, including negative full forms.
ä¸ææ°æ®é¢å¤çææ ä¸æåè¯è¯å¸åä¸æåç¨è¯
æ¼¢èªæååå¸
SentiBridge: ä¸æå®ä½ææç¥è¯åº å»ç»äººä»¬å¦ä½æè¿°æä¸ªå®ä½ï¼åå«æ°é»ãææ¸¸ãé¤é¥®ï¼å±è®¡30ä¸å¯¹ã
OpenCorpus A collection of freely available (Chinese) corpora.
ChineseNlpCorpus ææ/è§ç¹/è¯è®º å¾åæ§åæï¼ä¸æå½åå®ä½è¯å«ï¼æ¨èç³»ç»
FinancialDatasets SmoothNLP éèææ¬æ°æ®é(å¬å¼) Public Financial Datasets for NLP Researches Only
People's Daily & Children's Fairy Tale PD&CFT: A Chinese Reading Comprehension Dataset
ä¸æç»´åº23ä¸é«è´¨éè¯æ¡-æ´æ°è³23å¹´7æ-å·²è¿æ»¤æææäºè®®æ§ä¿¡æ¯

Organizations ä¸æNLPå¦æ¯ç»ç»åç«èµ

æ¸åå¤§å¦èªç¶è¯è¨å¤çä¸äººæè®¡ç®å®éªå®¤
åäº¬å¤§å¦è®¡ç®è¯è¨å¦æè²é¨éç¹å®éªå®¤
åå·¥å¤§æºè½ææ¯ä¸èªç¶è¯è¨å¤çå®éªå®¤
å¤æ¦å¤§å¦èªç¶è¯è¨å¤çç»
èå·å¤§å¦èªç¶è¯è¨å¤çç»
ä¸åå¤§å¦èªç¶è¯è¨å¤çå®éªå®¤
å¦é¨å¤§å¦æºè½ç§å¦ä¸ææ¯ç³»èªç¶è¯è¨å¤çå®éªå®¤
éå·å¤§å¦èªç¶è¯è¨å¤çå®éªå®¤
åä¸ºè¯ºäºæ¹èå®éªå®¤
CUHK Text Mining Group
PolyU Social Media Mining Group
HKUST Human Language Technology Center
National Taiwan University NLP Lab
ä¸å½ä¸æä¿¡æ¯å¦ä¼
NLP Conference Calender Main conferences, journals, workshops and shared tasks in NLP community.
2017 ç¬¬ä¸å±âè®¯é£æ¯âä¸ææºå¨éè¯»çè§£è¯æµ
2017 AI-Challenger å¾åä¸ææè¿° ç¨ä¸å¥è¯æè¿°ç»å®å¾åä¸çä¸»è¦ä¿¡æ¯ï¼ææä¸æè¯å¢ä¸çå¾åçè§£é®é¢ã
2017 AI-Challenger è±ä¸æºå¨ææ¬ç¿»è¯ ç¨å¤§è§æ¨¡çæ°æ®ï¼æåè±ä¸ææ¬æºå¨ç¿»è¯æ¨¡åçè½åã
2017 ç¥ä¹çå±±æ¯æºå¨å¦ä¹ ææèµ æ ¹æ®ç¥ä¹ç»åºçé®é¢åè¯é¢æ ç¾çç»å®å³ç³»çè®ç»æ°æ®ï¼è®ç»åºå¯¹æªæ æ³¨æ°æ®èªå¨æ æ³¨çæ¨¡åã
2018 å¾®ä¼é¶è¡æºè½å®¢æé®å¥å¹éå¤§èµ éå¯¹ä¸æççå®å®¢æè¯æï¼è¿è¡é®å¥æå¾å¹éï¼ç»å®ä¸¤ä¸ªè¯å¥ï¼å¤å®ä¸¤èæå¾æ¯å¦ç¸è¿ã

Industry ä¸æNLPåä¸æå¡

ç¾åº¦äºNLP æä¾ä¸çé¢åçèªç¶è¯è¨å¤çææ¯ï¼æä¾ä¼è´¨ææ¬å¤çåçè§£ææ¯
é¿éäºNLP ä¸ºåç±»ä¼ä¸åå¼åèæä¾çç¨äºææ¬åæåææçæ ¸å¿å·¥å·
è¾è®¯äºNLP åºäºå¹¶è¡è®¡ç®ãåå¸å¼ç¬è«ç³»ç»ï¼ç»åç¬ç¹çè¯ä¹åæææ¯ï¼ä¸ç«æ»¡è¶³NLPãè½¬ç ãæ½åãæ°æ®æåçéæ±
è®¯é£å¼æ¾å¹³å° ä»¥è¯é³äº¤äºä¸ºæ ¸å¿çäººå·¥æºè½å¼æ¾å¹³å°
æçå®éªå®¤ åè¯åè¯æ§æ æ³¨
ç»æ£®æ°æ® ä¸æµ·ç»æ£®æ°æ®ç§ææéå¬å¸ï¼ä¸æ³¨ä¸æè¯ä¹åæææ¯
äºåç§æ NLPå·¥å·åãç¥è¯å¾è°±ãææ¬ææãå¯¹è¯ç³»ç»ãèæåæç
æºè¨ç§æ ä¸æ³¨äºæ·±åº¦å¦ä¹ åç¥è¯å¾è°±ææ¯çªç ´çäººå·¥æºè½å¬å¸
è¿½ä¸ç§æ ä¸»æ»æ·±åº¦å¦ä¹ åèªç¶è¯è¨å¤ç

Learning Materials å¦ä¹ èµæ

ä¸æDeep Learning Book
Stanford CS224n Natural Language Processing with Deep Learning 2017
Oxford CS DeepNLP 2017
[Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"] (https://github.com/jacobeisenstein/gt-nlp-class)
Speech and Language Processing by Dan Jurafsky and James H. Martin
52nlp æç±èªç¶è¯è¨å¤ç
hankcs ç ååº
ææ¬å¤çå®è·µè¯¾èµæ ææ¬å¤çå®è·µè¯¾èµæï¼åå«ææ¬ç¹å¾æåï¼TF-IDFï¼ï¼ææ¬åç±»ï¼ææ¬èç±»ï¼word2vecè®ç»è¯åéååä¹è¯è¯æä¸æè¯è¯ç¸ä¼¼åº¦è®¡ç®ãææ¡£èªå¨æè¦ï¼ä¿¡æ¯æ½åï¼ææåæä¸è§ç¹ææçå®éªã
nlp_tasks Natural Language Processing Tasks and Selected References
Chinese NLP Shared tasks, datasets and state-of-the-art results for Chinese Natural Language Processing

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of lac

Cons of lac

Code comparison

Summary

Pros of jieba

Cons of jieba

Code Comparison

Summary

Pros of HanLP

Cons of HanLP

Code Comparison

Summary

Pros of snownlp

Cons of snownlp

Code comparison

Summary

Pros of bert4keras

Cons of bert4keras

Code Comparison

Convert designs to code with AI

README

awesome-chinese-nlp

Contents åè¡¨

1. Chinese NLP Toolkits ä¸­æNLPå·¥å ·

2. Corpus ä¸­æè¯­æ

3. Organizations ä¸­æNLPå­¦æ¯ç»ç»åç«èµ

4. Industry ä¸­æNLPåä¸æå¡

5. Learning Materials å­¦ä¹ èµæ

Chinese NLP Toolkits ä¸­æNLPå·¥å ·

Toolkits ç»¼åNLPå·¥å ·å

Popular NLP Toolkits for English/Multi-Language å¸¸ç¨çè±æææ¯æå¤è¯­è¨çNLPå·¥å ·å

Chinese Word Segment ä¸­æåè¯

Information Extraction ä¿¡æ¯æå

QA & Chatbot é®ç­åèå¤©æºå¨äºº

Multi-Modal Representation & Retrieval å¤æ¨¡æè¡¨å¾ä¸æ£ç´¢

Corpus ä¸­æè¯­æ

Organizations ä¸­æNLPå­¦æ¯ç»ç»åç«èµ

Industry ä¸­æNLPåä¸æå¡

Learning Materials å­¦ä¹ èµæ

Top Related Projects

Convert designs to code with AI

Contents åè¡¨

1. Chinese NLP Toolkits ä¸æNLPå·¥å·

2. Corpus ä¸æè¯æ

3. Organizations ä¸æNLPå¦æ¯ç»ç»åç«èµ

4. Industry ä¸æNLPåä¸æå¡

5. Learning Materials å¦ä¹ èµæ

Chinese NLP Toolkits ä¸æNLPå·¥å·

Toolkits ç»¼åNLPå·¥å·å

Popular NLP Toolkits for English/Multi-Language å¸¸ç¨çè±æææ¯æå¤è¯è¨çNLPå·¥å·å

Chinese Word Segment ä¸æåè¯

Information Extraction ä¿¡æ¯æå

QA & Chatbot é®çåèå¤©æºå¨äºº

Multi-Modal Representation & Retrieval å¤æ¨¡æè¡¨å¾ä¸æ£ç´¢

Corpus ä¸æè¯æ

Organizations ä¸æNLPå¦æ¯ç»ç»åç«èµ

Industry ä¸æNLPåä¸æå¡

Learning Materials å¦ä¹ èµæ