Quick Overview
OpenCC (Open Chinese Convert) is an open-source project for converting between Traditional and Simplified Chinese. It provides a standardized way to handle Chinese text conversion with high accuracy and customizable conversion rules. OpenCC supports various Chinese variants and can be used as both a command-line tool and a programming library.
Pros
- High accuracy in Chinese text conversion
- Customizable conversion rules and dictionaries
- Supports multiple Chinese variants (Mainland China, Taiwan, Hong Kong, etc.)
- Available as both a command-line tool and a programming library
Cons
- Limited to Chinese language conversion only
- May require some technical knowledge to customize conversion rules
- Performance might be slower compared to simpler conversion methods
- Documentation could be more comprehensive for non-Chinese speakers
Code Examples
- Basic conversion from Simplified to Traditional Chinese:
import opencc
converter = opencc.OpenCC('s2t.json')
simplified = '汉字'
traditional = converter.convert(simplified)
print(traditional) # Output: 漢字
- Converting a file from Traditional to Simplified Chinese:
import opencc
converter = opencc.OpenCC('t2s.json')
with open('input.txt', 'r', encoding='utf-8') as input_file:
content = input_file.read()
converted = converter.convert(content)
with open('output.txt', 'w', encoding='utf-8') as output_file:
output_file.write(converted)
- Custom conversion using a user-defined configuration:
import opencc
custom_config = {
'from': 'hk',
'to': 'tw',
'dict': {
'char': {'香港': '臺灣'},
'phrase': {'香港人': '臺灣人'}
}
}
converter = opencc.OpenCC(custom_config)
text = '我是香港人'
converted = converter.convert(text)
print(converted) # Output: 我是臺灣人
Getting Started
To use OpenCC in your Python project, follow these steps:
-
Install OpenCC using pip:
pip install opencc-python-reimplemented
-
Import the library and create a converter:
import opencc converter = opencc.OpenCC('s2t.json') # Simplified to Traditional
-
Convert text:
simplified = '开放中文转换' traditional = converter.convert(simplified) print(traditional) # Output: 開放中文轉換
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Open Chinese Convert éæ¾ä¸æè½æ
Introduction ä»ç´¹
Open Chinese Convert (OpenCC, éæ¾ä¸æè½æ) is an opensource project for conversions between Traditional Chinese, Simplified Chinese and Japanese Kanji (Shinjitai). It supports character-level and phrase-level conversion, character variant conversion and regional idioms among Mainland China, Taiwan and Hong Kong. This is not translation tool between Mandarin and Cantonese, etc.
ä¸æç°¡ç¹è½æéæºé ç®ï¼æ¯æè©å½ç´å¥çè½æãç°é«åè½æåå°åç¿æ £ç¨è©è½æï¼ä¸å大é¸ãèºç£ã馿¸¯ãæ¥æ¬æ°åé«ï¼ã䏿便®é話èç²µèªçè½æã
Discussion (Telegram): https://t.me/open_chinese_convert
Features ç¹é»
- å´æ ¼ååãä¸ç°¡å°å¤ç¹ãåãä¸ç°¡å°å¤ç°ãã
- å®å ¨å ¼å®¹ç°é«åï¼å¯ä»¥å¯¦ç¾åæ æ¿æã
- å´æ ¼å¯©æ ¡ä¸ç°¡å°å¤ç¹è©æ¢ï¼ååç²ãè½ååä¸åãã
- æ¯æä¸å大é¸ãèºç£ã馿¸¯ç°é«ååå°åç¿æ £ç¨è©è½æï¼å¦ãè£ãã裡ãããé¼ æ¨ããæ»é¼ ãã
- è©åº«å彿¸åº«å®å ¨åé¢ï¼å¯ä»¥èªç±ä¿®æ¹ãå°å ¥ãæ´å±ã
Installation å®è£
Package Managers å 管çå¨
Prebuilt é ç·¨è¯
- Windows (x86_64): Latest build
- Windows (x86): Latest build
Usage 使ç¨
Online demo ç·ä¸è½æå±ç¤º
Warning: This is NOT an API. You will be banned if you make calls programmatically.
Node.js
npm npm install opencc
JavaScript
const OpenCC = require('opencc');
const converter = new OpenCC('s2t.json');
converter.convertPromise("æ±å").then(converted => {
console.log(converted); // æ¼¢å
});
TypeScript
import { OpenCC } from 'opencc';
async function main() {
const converter: OpenCC = new OpenCC('s2t.json');
const result: string = await converter.convertPromise('æ±å');
console.log(result);
}
See demo.js and ts-demo.ts.
Python
pip install opencc
(Windows, Linux, Mac)
import opencc
converter = opencc.OpenCC('s2t.json')
converter.convert('æ±å') # æ¼¢å
C++
#include "opencc.h"
int main() {
const opencc::SimpleConverter converter("s2t.json");
converter.Convert("æ±å"); // æ¼¢å
return 0;
}
C
#include "opencc.h"
int main() {
opencc_t opencc = opencc_open("s2t.json");
const char* input = "æ±å";
char* converted = opencc_convert_utf8(opencc, input, strlen(input)); // æ¼¢å
opencc_convert_utf8_free(converted);
opencc_close(opencc);
return 0;
}
Document ææª: https://byvoid.github.io/OpenCC/
Command Line
opencc --help
opencc_dict --help
opencc_phrase_extract --help
Others (Unofficial)
- Swift (iOS): SwiftyOpenCC
- iOSOpenCC (pod): iOSOpenCC
- Java: opencc4j
- Android: android-opencc
- PHP: opencc4php
- Pure JavaScript: opencc-js
- WebAssembly: wasm-opencc
- Browser Extension: opencc-extension
- Go (Pure): OpenCC for Go
- Dart (native-assets): opencc-dart
Configurations é ç½®æä»¶
é è¨é ç½®æä»¶
s2t.json
Simplified Chinese to Traditional Chinese ç°¡é«å°ç¹é«t2s.json
Traditional Chinese to Simplified Chinese ç¹é«å°ç°¡é«s2tw.json
Simplified Chinese to Traditional Chinese (Taiwan Standard) ç°¡é«å°èºç£æ£é«tw2s.json
Traditional Chinese (Taiwan Standard) to Simplified Chinese èºç£æ£é«å°ç°¡é«s2hk.json
Simplified Chinese to Traditional Chinese (Hong Kong variant) ç°¡é«å°é¦æ¸¯ç¹é«hk2s.json
Traditional Chinese (Hong Kong variant) to Simplified Chinese 馿¸¯ç¹é«å°ç°¡é«s2twp.json
Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom ç°¡é«å°ç¹é«ï¼èºç£æ£é«æ¨æºï¼ä¸¦è½æç²èºç£å¸¸ç¨è©å½tw2sp.json
Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom ç¹é«ï¼èºç£æ£é«æ¨æºï¼å°ç°¡é«ä¸¦è½æç²ä¸å大é¸å¸¸ç¨è©å½t2tw.json
Traditional Chinese (OpenCC Standard) to Taiwan Standard ç¹é«ï¼OpenCC æ¨æºï¼å°èºç£æ£é«hk2t.json
Traditional Chinese (Hong Kong variant) to Traditional Chinese 馿¸¯ç¹é«å°ç¹é«ï¼OpenCC æ¨æºï¼t2hk.json
Traditional Chinese (OpenCC Standard) to Hong Kong variant ç¹é«ï¼OpenCC æ¨æºï¼å°é¦æ¸¯ç¹é«t2jp.json
Traditional Chinese Characters (KyÅ«jitai) to New Japanese Kanji (Shinjitai) ç¹é«ï¼OpenCC æ¨æºï¼èåé«ï¼å°æ¥ææ°åé«jp2t.json
New Japanese Kanji (Shinjitai) to Traditional Chinese Characters (KyÅ«jitai) æ¥ææ°åé«å°ç¹é«ï¼OpenCC æ¨æºï¼èåé«ï¼tw2t.json
Traditional Chinese (Taiwan standard) to Traditional Chinese èºç£æ£é«å°ç¹é«ï¼OpenCC æ¨æºï¼
Build ç·¨è¯
Build with CMake
Linux & Mac OS X
g++ 4.6+ or clang 3.2+ is required.
make
Windows Visual Studio:
build.cmd
Build with Bazel
bazel build //:opencc
bazel test --test_output=all //src/... //data/... //test/...
Test 測試
Linux & Mac OS X
make test
Windows Visual Studio:
test.cmd
Benchmark åºæºæ¸¬è©¦
make benchmark
Example results (from Github CI):
1: ------------------------------------------------------------------
1: Benchmark Time CPU Iterations
1: ------------------------------------------------------------------
1: BM_Initialization/hk2s 1.56 ms 1.56 ms 442
1: BM_Initialization/hk2t 0.144 ms 0.144 ms 4878
1: BM_Initialization/jp2t 0.260 ms 0.260 ms 2604
1: BM_Initialization/s2hk 23.8 ms 23.8 ms 29
1: BM_Initialization/s2t 25.6 ms 25.6 ms 28
1: BM_Initialization/s2tw 24.0 ms 23.9 ms 30
1: BM_Initialization/s2twp 24.6 ms 24.6 ms 28
1: BM_Initialization/t2hk 0.052 ms 0.052 ms 12897
1: BM_Initialization/t2jp 0.141 ms 0.141 ms 5012
1: BM_Initialization/t2s 1.30 ms 1.30 ms 540
1: BM_Initialization/tw2s 1.39 ms 1.39 ms 529
1: BM_Initialization/tw2sp 1.69 ms 1.69 ms 426
1: BM_Initialization/tw2t 0.089 ms 0.089 ms 7707
1: BM_Convert2M 582 ms 582 ms 1
1: BM_Convert/100 1.07 ms 1.07 ms 636
1: BM_Convert/1000 11.0 ms 11.0 ms 67
1: BM_Convert/10000 113 ms 113 ms 6
1: BM_Convert/100000 1176 ms 1176 ms 1
Projects using OpenCC ä½¿ç¨ OpenCC çé ç®
Please update if your project is using OpenCC.
License 許å¯åè°
Apache License 2.0
Third Party Library ç¬¬ä¸æ¹åº«
- darts-clone BSD License
- marisa-trie BSD License
- tclap MIT License
- rapidjson MIT License
- Google Test BSD License
All these libraries are statically linked by default.
Change History çæ¬æ·å²
Links ç¸é鿥
- Introduction 詳細ä»ç´¹ https://github.com/BYVoid/OpenCC/wiki/%E7%B7%A3%E7%94%B1
- ç¾ä»£æ¼¢èªå¸¸ç¨ç°¡ç¹ä¸å°å¤å義辨æè¡¨ http://ytenx.org/byohlyuk/KienxPyan
Contributors è²¢ç»è
- BYVoid
- 使¯
- Peng Huang
- LI Daobing
- Kefu Chai
- Kan-Ru Chen
- Ma Xiaojun
- Jiang Jiang
- Ruey-Cheng Chen
- Paul Meng
- Lawrence Lau
- ç¾æ
- å §æ¨ä¸é
- Marguerite Su
- Brian White
- Qijiang Fan
- LEOYoon-Tsaw
- Steven Yao
- Pellaeon Lin
- stony
- steelywing
- åæä¸
- Weng Xuetian
- Ma Tao
- Heinz Wiesinger
- J.W
- Amo Wu
- Mark Tsai
- Zhe Wang
- sgqy
- Qichuan (Sean) ZHANG
- Flandre Scarlet
- å®è¾°æ
- iwater
- Xpol Wan
- Weihang Lo
- Cychih
- kyleskimo
- Ryuan Choi
- Prcuvu
- Tony Able
- Xiao Liang
Please feel free to update this list if you have contributed OpenCC.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot