Quick Overview
Python-pinyin is a Python library for converting Chinese characters to pinyin (the official romanization system for Standard Chinese). It supports multiple pinyin styles and can handle both simplified and traditional Chinese characters.
Pros
- Supports multiple pinyin styles (e.g., NORMAL, TONE, TONE2, TONE3, INITIALS, FIRST_LETTER)
- Handles both simplified and traditional Chinese characters
- Customizable pronunciation dictionary
- Actively maintained with regular updates
Cons
- May have occasional inaccuracies for uncommon or ambiguous characters
- Limited support for non-standard Chinese characters or dialects
- Performance may degrade with very large text inputs
- Requires additional data files for full functionality
Code Examples
Converting Chinese characters to pinyin:
from pypinyin import pinyin, Style
text = "中国"
result = pinyin(text)
print(result) # Output: [['zhōng'], ['guó']]
Using different pinyin styles:
from pypinyin import pinyin, Style
text = "你好"
result = pinyin(text, style=Style.TONE2)
print(result) # Output: [['ni3'], ['ha3o']]
Converting to initials only:
from pypinyin import pinyin, Style
text = "北京"
result = pinyin(text, style=Style.INITIALS)
print(result) # Output: [['b'], ['j']]
Getting Started
- Install the library using pip:
pip install pypinyin
- Import and use the library in your Python code:
from pypinyin import pinyin, lazy_pinyin, Style
# Basic usage
text = "你好世界"
print(pinyin(text)) # Output: [['nǐ'], ['hǎo'], ['shì'], ['jiè']]
# Lazy pinyin (without tones)
print(lazy_pinyin(text)) # Output: ['ni', 'hao', 'shi', 'jie']
# Using different styles
print(pinyin(text, style=Style.TONE2)) # Output: [['ni3'], ['ha3o'], ['shi4'], ['jie4']]
Competitor Comparisons
:cn: 汉字拼音 ➜ hàn zì pīn yīn
Pros of pinyin
- Written in JavaScript, making it suitable for web-based applications
- Supports multiple output formats (pinyin, initial, final, tone)
- Includes a command-line interface for easy use outside of code
Cons of pinyin
- Less comprehensive dictionary compared to python-pinyin
- May have slower performance for large-scale text processing
- Limited customization options for handling edge cases
Code Comparison
python-pinyin:
from pypinyin import pinyin, Style
text = "中文"
result = pinyin(text, style=Style.TONE)
print(result) # [['zhōng'], ['wén']]
pinyin:
const pinyin = require("pinyin");
const text = "中文";
const result = pinyin(text, { toneType: 'num' });
console.log(result); // [ [ 'zhong1' ], [ 'wen2' ] ]
Both libraries offer similar basic functionality for converting Chinese characters to pinyin. python-pinyin provides more advanced features and customization options, while pinyin is more suitable for JavaScript-based projects and offers simpler integration for web applications. The choice between the two depends on the specific project requirements and the development environment.
结巴中文分词
Pros of jieba
- Broader functionality: Offers word segmentation, part-of-speech tagging, and keyword extraction
- More comprehensive: Includes a large dictionary and supports custom dictionaries
- Higher performance: Optimized for speed and efficiency in processing Chinese text
Cons of jieba
- More complex: Requires more setup and configuration for specific tasks
- Larger footprint: Consumes more memory due to its comprehensive dictionary
- Less focused: Not specialized for pinyin conversion like python-pinyin
Code Comparison
python-pinyin:
from pypinyin import pinyin, Style
text = "中文测试"
result = pinyin(text, style=Style.TONE)
print(result)
jieba:
import jieba
text = "中文测试"
words = jieba.cut(text)
print(" ".join(words))
While python-pinyin focuses on converting Chinese characters to pinyin, jieba primarily handles word segmentation. The code examples demonstrate their core functionalities: python-pinyin converts text to pinyin, while jieba segments the text into words.
Both libraries serve different primary purposes but can be valuable tools for processing Chinese text, depending on the specific requirements of your project.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
æ±åæ¼é³è½¬æ¢å·¥å ·ï¼Python çï¼
|Build| |GitHubAction| |Coverage| |Pypi version| |PyPI downloads| |DOI|
å°æ±å转为æ¼é³ãå¯ä»¥ç¨äºæ±å注é³ãæåºãæ£ç´¢(Russian translation
_) ã
æåçæ¬ç代ç åèäº hotoo/pinyin <https://github.com/hotoo/pinyin>
__ çå®ç°ã
- Documentation: https://pypinyin.readthedocs.io/
- GitHub: https://github.com/mozillazg/python-pinyin
- License: MIT license
- PyPI: https://pypi.org/project/pypinyin
- Python version: 2.7, pypy, pypy3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12
.. contents::
ç¹æ§
- æ ¹æ®è¯ç»æºè½å¹é ææ£ç¡®çæ¼é³ã
- æ¯æå¤é³åã
- ç®åçç¹ä½æ¯æï¼æ³¨é³æ¯æï¼å¨å¦¥çæ¼é³æ¯æã
- æ¯æå¤ç§ä¸åæ¼é³/注é³é£æ ¼ã
å®è£
.. code-block:: bash
pip install pypinyin
使ç¨ç¤ºä¾
.. code-block:: python
>>> from pypinyin import pinyin, lazy_pinyin, Style
>>> pinyin('ä¸å¿') # or pinyin(['ä¸å¿'])ï¼åæ°å¼ä¸ºå表æ¶è¡¨ç¤ºè¾å
¥çæ¯å·²åè¯åçæ°æ®
[['zhÅng'], ['xÄ«n']]
>>> pinyin('ä¸å¿', heteronym=True) # å¯ç¨å¤é³å模å¼
[['zhÅng', 'zhòng'], ['xÄ«n']]
>>> pinyin('ä¸å¿', style=Style.FIRST_LETTER) # 设置æ¼é³é£æ ¼
[['z'], ['x']]
>>> pinyin('ä¸å¿', style=Style.TONE2, heteronym=True)
[['zho1ng', 'zho4ng'], ['xi1n']]
>>> pinyin('ä¸å¿', style=Style.TONE3, heteronym=True)
[['zhong1', 'zhong4'], ['xin1']]
>>> pinyin('ä¸å¿', style=Style.BOPOMOFO) # 注é³é£æ ¼
[['ãã¨ã¥'], ['ãã§ã£']]
>>> lazy_pinyin('å¨å¦¥çæ¼é³', style=Style.WADEGILES)
['wei', "t'o", 'ma', "p'in", 'yin']
>>> lazy_pinyin('ä¸å¿') # ä¸èèå¤é³åçæ
åµ
['zhong', 'xin']
>>> lazy_pinyin('æç¥', v_to_u=True) # ä¸ä½¿ç¨ v 表示 ü
['zhan', 'lüe']
# ä½¿ç¨ 5 æ è¯è½»å£°
>>> lazy_pinyin('衣裳', style=Style.TONE3, neutral_tone_with_five=True)
['yi1', 'shang5']
# åè° nÇ hÇo -> nà hÇo
>>> lazy_pinyin('ä½ å¥½', style=Style.TONE2, tone_sandhi=True)
['ni2', 'ha3o']
注æäºé¡¹ ï¼
- é»è®¤æ
åµä¸æ¼é³ç»æä¸ä¼æ æåªä¸ªéµæ¯æ¯è½»å£°ï¼è½»å£°çéµæ¯æ²¡æ声è°ææ°åæ è¯ï¼å¯ä»¥éè¿åæ°
neutral_tone_with_five=True
å¼å¯ä½¿ç¨5
æ è¯è½»å£° ï¼ã - é»è®¤æ
åµä¸æ 声è°ç¸å
³æ¼é³é£æ ¼ä¸çç»æä¼ä½¿ç¨
v
表示ü
ï¼å¯ä»¥éè¿åæ°v_to_u=True
å¼å¯ä½¿ç¨Ã¼
代æ¿v
ï¼ã - é»è®¤æ
åµä¸ä¼åæ ·è¾åºæ²¡ææ¼é³çå符ï¼èªå®ä¹å¤ç没ææ¼é³çå符çæ¹æ³è§
ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/usage.html#handle-no-pinyin>
__ ï¼ã å¯
çæ¼é³å¹¶ä¸æ¯å¤§é¨å人以为çen
以ååå¨æ¢æ²¡æ声æ¯ä¹æ²¡æéµæ¯çæ¼é³ï¼è¯¦è§ä¸æ¹ FAQ ä¸ç说æã
å½ä»¤è¡å·¥å ·ï¼
.. code-block:: console
$ pypinyin é³ä¹
yīn yuè
$ python -m pypinyin.tools.toneconvert to-tone 'zhong4 xin1'
zhòng xīn
ææ¡£
详ç»æ档请访é®ï¼https://pypinyin.readthedocs.io/ã
项ç®ä»£ç å¼åæ¹é¢çé®é¢å¯ä»¥çç å¼åææ¡£
_ ã
FAQ
æ¼é³æè¯¯ï¼ +++++++++++++++++++++++++++++
å¯ä»¥éè¿ä¸é¢çæ¹æ³æé«æ¼é³åç¡®æ§ï¼
- å¯ä»¥éè¿èªå®ä¹è¯ç»æ¼é³åºæè
ååæ¼é³åºçæ¹å¼ä¿®æ£æ¼é³ç»æï¼
详è§
ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/usage.html#custom-dict>
__ ã
.. code-block:: python
>> from pypinyin import load_phrases_dict, load_single_dict
>> load_phrases_dict({'æ¡å': [['jú'], ['zÇ']]}) # å¢å "æ¡å" è¯ç»
>> load_single_dict({ord('è¿'): 'hái,huán'}) # è°æ´ "è¿" åçæ¼é³é¡ºåºæè¦çé»è®¤æ¼é³
- ä¹å¯ä»¥ä½¿ç¨
pypinyin-dict <https://github.com/mozillazg/pypinyin-dict>
__ 项ç®æä¾çèªå®ä¹æ¼é³åºæ¥çº æ£ç»æã
.. code-block:: python
# ä½¿ç¨ phrase-pinyin-data 项ç®ä¸ cc_cedict.txt æ件ä¸çæ¼é³æ°æ®ä¼åç»æ
>>> from pypinyin_dict.phrase_pinyin_data import cc_cedict
>>> cc_cedict.load()
# ä½¿ç¨ pinyin-data 项ç®ä¸ kXHC1983.txt æ件ä¸çæ¼é³æ°æ®ä¼åç»æ
>>> from pypinyin_dict.pinyin_data import kxhc1983
>>> kxhc1983.load()
- å¦ææ¯åè¯å¯¼è´çæ¼é³æ误çè¯ï¼å¯ä»¥å 使ç¨å ¶ä»çåè¯æ¨¡å对æ°æ®è¿è¡åè¯å¤çï¼ ç¶åå°åè¯åçè¯ç»ç»æå表ä½ä¸ºå½æ°çåæ°å³å¯:
.. code-block:: python
>>> # 使ç¨å
¶ä»åè¯æ¨¡ååè¯ï¼æ¯å¦ jieba ä¹ç±»ï¼
>>> #æè
åºäº phrases_dict.py éçè¯è¯æ°æ®ä½¿ç¨å
¶ä»åè¯ç®æ³åè¯
>>> words = list(jieba.cut('æ¯è¡24.67ç¾å
çç¡®å®æ§åè®®'))
>>> pinyin(words)
- å¦æä½ å¸æè½éè¿è®ç»æ¨¡åçæ¹å¼æé«æ¼é³åç¡®æ§çè¯ï¼å¯ä»¥çä¸ä¸
pypinyin-g2pW <https://github.com/mozillazg/pypinyin-g2pW>
__ è¿ä¸ªé¡¹ç®ã
为ä»ä¹æ²¡æ y, w, yu å 个声æ¯ï¼ ++++++++++++++++++++++++++++++++++++++++++++
.. code-block:: python
>>> from pypinyin import Style, pinyin
>>> pinyin('ä¸é¨å¤©', style=Style.INITIALS)
[['x'], [''], ['t']]
å ä¸ºæ ¹æ® ãæ±è¯æ¼é³æ¹æ¡ã <http://www.moe.gov.cn/jyb_sjzl/ziliao/A19/195802/t19580201_186000.html>
__ ï¼
yï¼wï¼Ã¼ (yu) é½ä¸æ¯å£°æ¯ã
声æ¯é£æ ¼ï¼INITIALSï¼ä¸ï¼âé¨âãâæâãâåâçæ±åè¿å空å符串ï¼å ä¸ºæ ¹æ®
`ãæ±è¯æ¼é³æ¹æ¡ã <http://www.moe.gov.cn/jyb_sjzl/ziliao/A19/195802/t19580201_186000.html>`__ ï¼
yï¼wï¼Ã¼ (yu) é½ä¸æ¯å£°æ¯ï¼å¨æäºç¹å®éµæ¯æ 声æ¯æ¶ï¼æå ä¸ y æ wï¼è ü ä¹æå
¶ç¹å®è§åã ââ @hotoo
**å¦æä½ è§å¾è¿ä¸ªç»ä½ 带æ¥äºéº»ç¦ï¼é£ä¹ä¹è¯·å°å¿ä¸äºæ 声æ¯çæ±åï¼å¦âåâãâ饿âãâæâãâæâçï¼ã
è¿æ¶åä½ ä¹è®¸éè¦çæ¯é¦åæ¯é£æ ¼ï¼FIRST_LETTERï¼**ã ââ @hotoo
åè: `hotoo/pinyin#57 <https://github.com/hotoo/pinyin/issues/57>`__,
`#22 <https://github.com/mozillazg/python-pinyin/pull/22>`__,
`#27 <https://github.com/mozillazg/python-pinyin/issues/27>`__,
`#44 <https://github.com/mozillazg/python-pinyin/issues/44>`__
å¦æè§å¾è¿ä¸ªè¡ä¸ºä¸æ¯ä½ æ³è¦çï¼å°±æ¯æ³æ y å½æ声æ¯çè¯ï¼å¯ä»¥æå® strict=False
ï¼
è¿ä¸ªå¯è½ä¼ç¬¦åä½ çé¢æï¼
.. code-block:: python
>>> from pypinyin import Style, pinyin
>>> pinyin('ä¸é¨å¤©', style=Style.INITIALS)
[['x'], [''], ['t']]
>>> pinyin('ä¸é¨å¤©', style=Style.INITIALS, strict=False)
[['x'], ['y'], ['t']]
è¯¦è§ strict åæ°çå½±å
_ ã
åå¨æ¢æ²¡æ声æ¯ä¹æ²¡æéµæ¯çæ¼é³ï¼ +++++++++++++++++++++++++++++++++
æ¯çï¼strict=True
模å¼ä¸åå¨æå°æ°æ¢æ²¡æ声æ¯ä¹æ²¡æéµæ¯çæ¼é³ã
æ¯å¦ä¸é¢è¿äºæ¼é³ï¼æ¥èªæ±å å¯
ãå
ãå£
ãå
ï¼::
Å Åg Åg ǹg Šǹ mÌ á¸¿ mÌ
å°¤å
¶éè¦æ³¨æçæ¯ å¯
çæææ¼é³é½æ¢æ²¡æ声æ¯ä¹æ²¡æéµæ¯ï¼å£
çé»è®¤æ¼é³æ¢æ²¡æ声æ¯ä¹æ²¡æéµæ¯ã
è¯¦è§ #109
_ #259
_ #284
_ ã
å¦ä½å°æä¸é£æ ¼çæ¼é³è½¬æ¢ä¸ºå ¶ä»é£æ ¼çæ¼é³ï¼ ++++++++++++++++++++++++++++++++++++++++++++
å¯ä»¥éè¿ pypinyin.contrib.tone_convert
模åæä¾çè¾
å©å½æ°å¯¹æ åæ¼é³è¿è¡è½¬æ¢ï¼å¾å°ä¸åé£æ ¼çæ¼é³ã
æ¯å¦å° zhÅng
转æ¢ä¸º zhong
ï¼æè
è·åæ¼é³ä¸ç声æ¯æéµæ¯æ°æ®ï¼
.. code-block:: python
>>> from pypinyin.contrib.tone_convert import to_normal, to_tone, to_initials, to_finals
>>> to_normal('zhÅng')
'zhong'
>>> to_tone('zhong1')
'zhÅng'
>>> to_initials('zhÅng')
'zh'
>>> to_finals('zhÅng')
'ong'
æ´å¤æ¼é³è½¬æ¢çè¾
å©å½æ°ï¼è¯¦è§ pypinyin.contrib.tone_convert
模åç
ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/contrib.html#tone-convert>
__ ã
å¦ä½åå°å åå ç¨ï¼ ++++++++++++++++++++
å¦æ对æ¼é³çåç¡®æ§ä¸æ¯ç¹å«å¨æçè¯ï¼å¯ä»¥éè¿è®¾ç½®ç¯å¢åé PYPINYIN_NO_PHRASES
å PYPINYIN_NO_DICT_COPY
æ¥èçå
åã
è¯¦è§ ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/faq.html#no-phrases>
__
æ´å¤ FAQ 详è§ææ¡£ä¸ç
FAQ <https://pypinyin.readthedocs.io/zh_CN/master/faq.html>
__ é¨åã
.. _#13 : https://github.com/mozillazg/python-pinyin/issues/113 .. _strict åæ°çå½±å: https://pypinyin.readthedocs.io/zh_CN/master/usage.html#strict
æ¼é³æ°æ®
- å个æ±åçæ¼é³ä½¿ç¨
pinyin-data
_ çæ°æ® - è¯ç»çæ¼é³ä½¿ç¨
phrase-pinyin-data
_ çæ°æ® - 声æ¯åéµæ¯ä½¿ç¨
ãæ±è¯æ¼é³æ¹æ¡ã <http://www.moe.gov.cn/jyb_sjzl/ziliao/A19/195802/t19580201_186000.html>
__ çæ°æ®
Related Projects
hotoo/pinyin
__: æ±åæ¼é³è½¬æ¢å·¥å · Node.js/JavaScript çãmozillazg/go-pinyin
__: æ±åæ¼é³è½¬æ¢å·¥å · Go çãmozillazg/rust-pinyin
__: æ±åæ¼é³è½¬æ¢å·¥å · Rust çãwolfgitpr/cpp-pinyin
__: æ±åæ¼é³è½¬æ¢å·¥å · c++ çãwolfgitpr/csharp-pinyin
__: æ±åæ¼é³è½¬æ¢å·¥å · c# çã
__ https://github.com/hotoo/pinyin __ https://github.com/mozillazg/go-pinyin __ https://github.com/mozillazg/rust-pinyin __ https://github.com/wolfgitpr/cpp-pinyin __ https://github.com/wolfgitpr/csharp-pinyin
.. |Build| image:: https://img.shields.io/circleci/project/github/mozillazg/python-pinyin/master.svg :target: https://circleci.com/gh/mozillazg/python-pinyin .. |GitHubAction| image:: https://github.com/mozillazg/python-pinyin/workflows/CI/badge.svg :target: https://github.com/mozillazg/python-pinyin/actions .. |Coverage| image:: https://img.shields.io/coveralls/github/mozillazg/python-pinyin/master.svg :target: https://coveralls.io/github/mozillazg/python-pinyin .. |PyPI version| image:: https://img.shields.io/pypi/v/pypinyin.svg :target: https://pypi.org/project/pypinyin/ .. |DOI| image:: https://zenodo.org/badge/12830126.svg :target: https://zenodo.org/badge/latestdoi/12830126 .. |PyPI downloads| image:: https://img.shields.io/pypi/dm/pypinyin.svg :target: https://pypi.org/project/pypinyin/
.. _Russian translation: https://github.com/mozillazg/python-pinyin/blob/master/README_ru.rst .. _pinyin-data: https://github.com/mozillazg/pinyin-data .. _phrase-pinyin-data: https://github.com/mozillazg/phrase-pinyin-data .. _å¼åææ¡£: https://pypinyin.readthedocs.io/zh_CN/develop/develop.html .. _#109: https://github.com/mozillazg/python-pinyin/issues/109 .. _#259: https://github.com/mozillazg/python-pinyin/issues/259 .. _#284: https://github.com/mozillazg/python-pinyin/issues/284
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot