python-pinyin

汉字转拼音(pypinyin)

5,086

623

5,086

Top Related Projects

Quick Overview

Python-pinyin is a Python library for converting Chinese characters to pinyin (the official romanization system for Standard Chinese). It supports multiple pinyin styles and can handle both simplified and traditional Chinese characters.

Pros

Supports multiple pinyin styles (e.g., NORMAL, TONE, TONE2, TONE3, INITIALS, FIRST_LETTER)
Handles both simplified and traditional Chinese characters
Customizable pronunciation dictionary
Actively maintained with regular updates

Cons

May have occasional inaccuracies for uncommon or ambiguous characters
Limited support for non-standard Chinese characters or dialects
Performance may degrade with very large text inputs
Requires additional data files for full functionality

Code Examples

Converting Chinese characters to pinyin:

from pypinyin import pinyin, Style

text = "中国"
result = pinyin(text)
print(result)  # Output: [['zhōng'], ['guó']]

Using different pinyin styles:

from pypinyin import pinyin, Style

text = "你好"
result = pinyin(text, style=Style.TONE2)
print(result)  # Output: [['ni3'], ['ha3o']]

Converting to initials only:

from pypinyin import pinyin, Style

text = "北京"
result = pinyin(text, style=Style.INITIALS)
print(result)  # Output: [['b'], ['j']]

Getting Started

Install the library using pip:

pip install pypinyin

Import and use the library in your Python code:

from pypinyin import pinyin, lazy_pinyin, Style

# Basic usage
text = "你好世界"
print(pinyin(text))  # Output: [['nǐ'], ['hǎo'], ['shì'], ['jiè']]

# Lazy pinyin (without tones)
print(lazy_pinyin(text))  # Output: ['ni', 'hao', 'shi', 'jie']

# Using different styles
print(pinyin(text, style=Style.TONE2))  # Output: [['ni3'], ['ha3o'], ['shi4'], ['jie4']]

Competitor Comparisons

pinyin

7,648

:cn: 汉字拼音 ➜ hàn zì pīn yīn

Pros of pinyin

Written in JavaScript, making it suitable for web-based applications
Supports multiple output formats (pinyin, initial, final, tone)
Includes a command-line interface for easy use outside of code

Cons of pinyin

Less comprehensive dictionary compared to python-pinyin
May have slower performance for large-scale text processing
Limited customization options for handling edge cases

Code Comparison

python-pinyin:

from pypinyin import pinyin, Style

text = "中文"
result = pinyin(text, style=Style.TONE)
print(result)  # [['zhōng'], ['wén']]

pinyin:

const pinyin = require("pinyin");

const text = "中文";
const result = pinyin(text, { toneType: 'num' });
console.log(result);  // [ [ 'zhong1' ], [ 'wen2' ] ]

Both libraries offer similar basic functionality for converting Chinese characters to pinyin. python-pinyin provides more advanced features and customization options, while pinyin is more suitable for JavaScript-based projects and offers simpler integration for web applications. The choice between the two depends on the specific project requirements and the development environment.

jieba

34,028

结巴中文分词

Pros of jieba

Broader functionality: Offers word segmentation, part-of-speech tagging, and keyword extraction
More comprehensive: Includes a large dictionary and supports custom dictionaries
Higher performance: Optimized for speed and efficiency in processing Chinese text

Cons of jieba

More complex: Requires more setup and configuration for specific tasks
Larger footprint: Consumes more memory due to its comprehensive dictionary
Less focused: Not specialized for pinyin conversion like python-pinyin

Code Comparison

python-pinyin:

from pypinyin import pinyin, Style

text = "中文测试"
result = pinyin(text, style=Style.TONE)
print(result)

jieba:

import jieba

text = "中文测试"
words = jieba.cut(text)
print(" ".join(words))

While python-pinyin focuses on converting Chinese characters to pinyin, jieba primarily handles word segmentation. The code examples demonstrate their core functionalities: python-pinyin converts text to pinyin, while jieba segments the text into words.

Both libraries serve different primary purposes but can be valuable tools for processing Chinese text, depending on the specific requirements of your project.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

æ±åæ¼é³è½¬æ¢å·¥å·ï¼Python çï¼

å°æ±åè½¬ä¸ºæ¼é³ãå¯ä»¥ç¨äºæ±åæ³¨é³ãæåºãæ£ç´¢(Russian translation_) ã

æåçæ¬çä»£ç åèäº hotoo/pinyin <https://github.com/hotoo/pinyin>__ çå®ç°ã

Documentation: https://pypinyin.readthedocs.io/
GitHub: https://github.com/mozillazg/python-pinyin
License: MIT license
PyPI: https://pypi.org/project/pypinyin
Python version: 2.7, pypy, pypy3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13

.. contents::

ç¹æ§

æ ¹æ®è¯ç»æºè½å¹éææ£ç¡®çæ¼é³ã
æ¯æå¤é³åã
ç®åçç¹ä½æ¯æï¼æ³¨é³æ¯æï¼å¨å¦¥çæ¼é³æ¯æã
æ¯æå¤ç§ä¸åæ¼é³/æ³¨é³é£æ ¼ã

å®è£

.. code-block:: bash

pip install pypinyin

ä½¿ç¨ç¤ºä¾

.. code-block:: python

>>> from pypinyin import pinyin, lazy_pinyin, Style
>>> pinyin('ä¸å¿')  # or pinyin(['ä¸å¿'])ï¼åæ°å¼ä¸ºåè¡¨æ¶è¡¨ç¤ºè¾å¥çæ¯å·²åè¯åçæ°æ®
[['zhÅng'], ['xÄ«n']]
>>> pinyin('ä¸å¿', heteronym=True)  # å¯ç¨å¤é³åæ¨¡å¼
[['zhÅng', 'zhÃ²ng'], ['xÄ«n']]
>>> pinyin('ä¸å¿', style=Style.FIRST_LETTER)  # è®¾ç½®æ¼é³é£æ ¼
[['z'], ['x']]
>>> pinyin('ä¸å¿', style=Style.TONE2, heteronym=True)
[['zho1ng', 'zho4ng'], ['xi1n']]
>>> pinyin('ä¸å¿', style=Style.TONE3, heteronym=True)
[['zhong1', 'zhong4'], ['xin1']]
>>> pinyin('ä¸å¿', style=Style.BOPOMOFO)  # æ³¨é³é£æ ¼
[['ãã¨ã¥'], ['ãã§ã£']]
>>> lazy_pinyin('å¨å¦¥çæ¼é³', style=Style.WADEGILES)
['wei', "t'o", 'ma', "p'in", 'yin']
>>> lazy_pinyin('ä¸å¿')  # ä¸èèå¤é³åçæåµ
['zhong', 'xin']
>>> lazy_pinyin('æç¥', v_to_u=True)  # ä¸ä½¿ç¨ v è¡¨ç¤º Ã¼
['zhan', 'lÃ¼e']
# ä½¿ç¨ 5 æ è¯è½»å£°
>>> lazy_pinyin('è¡£è£³', style=Style.TONE3, neutral_tone_with_five=True)
['yi1', 'shang5']
# åè°  nÇ hÇo -> nÃ hÇo
>>> lazy_pinyin('ä½ å¥½', style=Style.TONE2, tone_sandhi=True)
['ni2', 'ha3o']

æ³¨æäºé¡¹ ï¼

é»è®¤æåµä¸æ¼é³ç»æä¸ä¼æ æåªä¸ªéµæ¯æ¯è½»å£°ï¼è½»å£°çéµæ¯æ²¡æå£°è°ææ°åæ è¯ï¼å¯ä»¥éè¿åæ° neutral_tone_with_five=True å¼å¯ä½¿ç¨ 5 æ è¯è½»å£° ï¼ã
é»è®¤æåµä¸æ å£°è°ç¸å³æ¼é³é£æ ¼ä¸çç»æä¼ä½¿ç¨ v è¡¨ç¤º Ã¼ ï¼å¯ä»¥éè¿åæ° v_to_u=True å¼å¯ä½¿ç¨ Ã¼ ä»£æ¿ v ï¼ã
é»è®¤æåµä¸ä¼åæ ·è¾åºæ²¡ææ¼é³çåç¬¦ï¼èªå®ä¹å¤çæ²¡ææ¼é³çåç¬¦çæ¹æ³è§ ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/usage.html#handle-no-pinyin>__ ï¼ã
å¯ çæ¼é³å¹¶ä¸æ¯å¤§é¨åäººä»¥ä¸ºç en ä»¥ååå¨æ¢æ²¡æå£°æ¯ä¹æ²¡æéµæ¯çæ¼é³ï¼è¯¦è§ä¸æ¹ FAQ ä¸çè¯´æã

å½ä»¤è¡å·¥å·ï¼

.. code-block:: console

$ pypinyin é³ä¹
yÄ«n yuÃ¨

$ python -m pypinyin.tools.toneconvert to-tone 'zhong4 xin1'
zhÃ²ng xÄ«n

ææ¡£

è¯¦ç»ææ¡£è¯·è®¿é®ï¼https://pypinyin.readthedocs.io/ã

é¡¹ç®ä»£ç å¼åæ¹é¢çé®é¢å¯ä»¥çç å¼åææ¡£_ ã

FAQ

æ¼é³æè¯¯ï¼ +++++++++++++++++++++++++++++

å¯ä»¥éè¿ä¸é¢çæ¹æ³æé«æ¼é³åç¡®æ§ï¼

å¯ä»¥éè¿èªå®ä¹è¯ç»æ¼é³åºæèååæ¼é³åºçæ¹å¼ä¿®æ£æ¼é³ç»æï¼ è¯¦è§ ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/usage.html#custom-dict>__ ã

.. code-block:: python

>> from pypinyin import load_phrases_dict, load_single_dict

>> load_phrases_dict({'æ¡å': [['jÃº'], ['zÇ']]})  # å¢å  "æ¡å" è¯ç»

>> load_single_dict({ord('è¿'): 'hÃ¡i,huÃ¡n'})  # è°æ´ "è¿" åçæ¼é³é¡ºåºæè¦çé»è®¤æ¼é³

ä¹å¯ä»¥ä½¿ç¨ pypinyin-dict <https://github.com/mozillazg/pypinyin-dict>__ é¡¹ç®æä¾çèªå®ä¹æ¼é³åºæ¥çº æ£ç»æã

.. code-block:: python

# ä½¿ç¨ phrase-pinyin-data é¡¹ç®ä¸ cc_cedict.txt æä»¶ä¸çæ¼é³æ°æ®ä¼åç»æ
>>> from pypinyin_dict.phrase_pinyin_data import cc_cedict
>>> cc_cedict.load()

# ä½¿ç¨ pinyin-data é¡¹ç®ä¸ kXHC1983.txt æä»¶ä¸çæ¼é³æ°æ®ä¼åç»æ
>>> from pypinyin_dict.pinyin_data import kxhc1983
>>> kxhc1983.load()

å¦ææ¯åè¯å¯¼è´çæ¼é³æè¯¯çè¯ï¼å¯ä»¥åä½¿ç¨å¶ä»çåè¯æ¨¡åå¯¹æ°æ®è¿è¡åè¯å¤çï¼ ç¶åå°åè¯åçè¯ç»ç»æåè¡¨ä½ä¸ºå½æ°çåæ°å³å¯:

.. code-block:: python

>>> # ä½¿ç¨å¶ä»åè¯æ¨¡ååè¯ï¼æ¯å¦ jieba ä¹ç±»ï¼
>>> #æèåºäº phrases_dict.py éçè¯è¯æ°æ®ä½¿ç¨å¶ä»åè¯ç®æ³åè¯
>>> words = list(jieba.cut('æ¯è¡24.67ç¾åçç¡®å®æ§åè®®'))
>>> pinyin(words)

å¦æä½ å¸æè½éè¿è®ç»æ¨¡åçæ¹å¼æé«æ¼é³åç¡®æ§çè¯ï¼å¯ä»¥çä¸ä¸ pypinyin-g2pW <https://github.com/mozillazg/pypinyin-g2pW>__ è¿ä¸ªé¡¹ç®ã

ä¸ºä»ä¹æ²¡æ y, w, yu å ä¸ªå£°æ¯ï¼ ++++++++++++++++++++++++++++++++++++++++++++

.. code-block:: python

>>> from pypinyin import Style, pinyin
>>> pinyin('ä¸é¨å¤©', style=Style.INITIALS)
[['x'], [''], ['t']]

å ä¸ºæ ¹æ® ãæ±è¯æ¼é³æ¹æ¡ã <http://www.moe.gov.cn/jyb_sjzl/ziliao/A19/195802/t19580201_186000.html>__ ï¼ yï¼wï¼Ã¼ (yu) é½ä¸æ¯å£°æ¯ã

å£°æ¯é£æ ¼ï¼INITIALSï¼ä¸ï¼âé¨âãâæâãâåâçæ±åè¿åç©ºåç¬¦ä¸²ï¼å ä¸ºæ ¹æ®
`ãæ±è¯æ¼é³æ¹æ¡ã <http://www.moe.gov.cn/jyb_sjzl/ziliao/A19/195802/t19580201_186000.html>`__ ï¼
yï¼wï¼Ã¼ (yu) é½ä¸æ¯å£°æ¯ï¼å¨æäºç¹å®éµæ¯æ å£°æ¯æ¶ï¼æå ä¸ y æ wï¼è Ã¼ ä¹æå¶ç¹å®è§åã    ââ @hotoo

**å¦æä½ è§å¾è¿ä¸ªç»ä½ å¸¦æ¥äºéº»ç¦ï¼é£ä¹ä¹è¯·å°å¿ä¸äºæ å£°æ¯çæ±åï¼å¦âåâãâé¥¿âãâæâãâæâçï¼ã
è¿æ¶åä½ ä¹è®¸éè¦çæ¯é¦åæ¯é£æ ¼ï¼FIRST_LETTERï¼**ã    ââ @hotoo

åè: `hotoo/pinyin#57 <https://github.com/hotoo/pinyin/issues/57>`__,
`#22 <https://github.com/mozillazg/python-pinyin/pull/22>`__,
`#27 <https://github.com/mozillazg/python-pinyin/issues/27>`__,
`#44 <https://github.com/mozillazg/python-pinyin/issues/44>`__

.. code-block:: python

>>> from pypinyin import Style, pinyin
>>> pinyin('ä¸é¨å¤©', style=Style.INITIALS)
[['x'], [''], ['t']]
>>> pinyin('ä¸é¨å¤©', style=Style.INITIALS, strict=False)
[['x'], ['y'], ['t']]

è¯¦è§ strict åæ°çå½±å_ ã

åå¨æ¢æ²¡æå£°æ¯ä¹æ²¡æéµæ¯çæ¼é³ï¼ +++++++++++++++++++++++++++++++++

Å Åg Åg Ç¹g Å Ç¹ mÌ á¸¿ mÌ

å¦ä½å°æä¸é£æ ¼çæ¼é³è½¬æ¢ä¸ºå¶ä»é£æ ¼çæ¼é³ï¼ ++++++++++++++++++++++++++++++++++++++++++++

.. code-block:: python

>>> from pypinyin.contrib.tone_convert import to_normal, to_tone, to_initials, to_finals
>>> to_normal('zhÅng')
'zhong'
>>> to_tone('zhong1')
'zhÅng'
>>> to_initials('zhÅng')
'zh'
>>> to_finals('zhÅng')
'ong'

æ´å¤æ¼é³è½¬æ¢çè¾å©å½æ°ï¼è¯¦è§ pypinyin.contrib.tone_convert æ¨¡åç ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/contrib.html#tone-convert>__ ã

å¦ä½åå°ååå ç¨ï¼ ++++++++++++++++++++

å¦æå¯¹æ¼é³çåç¡®æ§ä¸æ¯ç¹å«å¨æçè¯ï¼å¯ä»¥éè¿è®¾ç½®ç¯å¢åé PYPINYIN_NO_PHRASES å PYPINYIN_NO_DICT_COPY æ¥èçååã è¯¦è§ ææ¡£ <https://pypinyin.readthedocs.io/zh_CN/master/faq.html#no-phrases>__

æ´å¤ FAQ è¯¦è§ææ¡£ä¸ç FAQ <https://pypinyin.readthedocs.io/zh_CN/master/faq.html>__ é¨åã

.. _#13 : https://github.com/mozillazg/python-pinyin/issues/113 .. _strict åæ°çå½±å: https://pypinyin.readthedocs.io/zh_CN/master/usage.html#strict

æ¼é³æ°æ®

åä¸ªæ±åçæ¼é³ä½¿ç¨ pinyin-data_ çæ°æ®
è¯ç»çæ¼é³ä½¿ç¨ phrase-pinyin-data_ çæ°æ®
å£°æ¯åéµæ¯ä½¿ç¨ ãæ±è¯æ¼é³æ¹æ¡ã <http://www.moe.gov.cn/jyb_sjzl/ziliao/A19/195802/t19580201_186000.html>__ çæ°æ®

Related Projects

hotoo/pinyin__: æ±åæ¼é³è½¬æ¢å·¥å· Node.js/JavaScript çã
mozillazg/go-pinyin__: æ±åæ¼é³è½¬æ¢å·¥å· Go çã
mozillazg/rust-pinyin__: æ±åæ¼é³è½¬æ¢å·¥å· Rust çã
wolfgitpr/cpp-pinyin__: æ±åæ¼é³è½¬æ¢å·¥å· c++ çã
wolfgitpr/csharp-pinyin__: æ±åæ¼é³è½¬æ¢å·¥å· c# çã

__ https://github.com/hotoo/pinyin __ https://github.com/mozillazg/go-pinyin __ https://github.com/mozillazg/rust-pinyin __ https://github.com/wolfgitpr/cpp-pinyin __ https://github.com/wolfgitpr/csharp-pinyin

.. |Build| image:: https://img.shields.io/circleci/project/github/mozillazg/python-pinyin/master.svg :target: https://circleci.com/gh/mozillazg/python-pinyin .. |GitHubAction| image:: https://github.com/mozillazg/python-pinyin/workflows/CI/badge.svg :target: https://github.com/mozillazg/python-pinyin/actions .. |Coverage| image:: https://img.shields.io/coveralls/github/mozillazg/python-pinyin/master.svg :target: https://coveralls.io/github/mozillazg/python-pinyin .. |PyPI version| image:: https://img.shields.io/pypi/v/pypinyin.svg :target: https://pypi.org/project/pypinyin/ .. |DOI| image:: https://zenodo.org/badge/12830126.svg :target: https://zenodo.org/badge/latestdoi/12830126 .. |PyPI downloads| image:: https://img.shields.io/pypi/dm/pypinyin.svg :target: https://pypi.org/project/pypinyin/

.. _Russian translation: https://github.com/mozillazg/python-pinyin/blob/master/README_ru.rst .. _pinyin-data: https://github.com/mozillazg/pinyin-data .. _phrase-pinyin-data: https://github.com/mozillazg/phrase-pinyin-data .. _å¼åææ¡£: https://pypinyin.readthedocs.io/zh_CN/develop/develop.html .. _#109: https://github.com/mozillazg/python-pinyin/issues/109 .. _#259: https://github.com/mozillazg/python-pinyin/issues/259 .. _#284: https://github.com/mozillazg/python-pinyin/issues/284

Top Related Projects

pinyin

7,648

:cn: 汉字拼音 ➜ hàn zì pīn yīn

jieba

34,028

结巴中文分词

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

python-pinyin

Top Related Projects

pinyin

jieba

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

pinyin

Pros of pinyin

Cons of pinyin

Code Comparison

jieba

Pros of jieba

Cons of jieba

Code Comparison

Convert designs to code with AI

README

æ±å­æ¼é³è½¬æ¢å·¥å ·ï¼Python çï¼

ç¹æ§

å®è£

ä½¿ç¨ç¤ºä¾

ææ¡£

FAQ

æ¼é³æ°æ®

Related Projects

Top Related Projects

pinyin

jieba

Convert designs to code with AI

æ±åæ¼é³è½¬æ¢å·¥å·ï¼Python çï¼

ç¹æ§

å®è£

ä½¿ç¨ç¤ºä¾

ææ¡£

æ¼é³æ°æ®