zhconv-rs converts Chinese text among several scripts or regional variants (e.g. zh-TW <-> zh-CN <-> zh-HK <-> zh-Hans <-> zh-Hant
), built on the top of zhConversion.php conversion tables from Mediawiki, which is the one also used on Chinese Wikipedia.
Web App: https://zhconv.pages.dev/ (powered by WASM)
| Target | Tag | Script | Description |
| -------------------------------------- | --------- | ------- | --------------------------------------------- |
| Simplified Chinese / 简体中文 | zh-Hans
| SC / 简 | W/O substituing region-specific phrases. |
| Traditional Chinese / 繁體中文 | zh-Hant
| TC / 繁 | W/O substituing region-specific phrases. |
| Chinese (Taiwan) / 臺灣正體 | zh-TW
| TC / 繁 | With Taiwan-specific phrases adapted. |
| Chinese (Hong Kong) / 香港繁體 | zh-HK
| TC / 繁 | With Hong Kong-specific phrases adapted. |
| Chinese (Macau) / 澳门繁體 | zh-MO
| TC / 繁 | Same as zh-HK
for now. |
| Chinese (Mainland China) / 大陆简体 | zh-CN
| SC / 简 | With mainland China-specific phrases adapted. |
| Chinese (Singapore) / 新加坡简体 | zh-SG
| SC / 简 | Same as zh-CN
for now. |
| Chinese (Malaysia) / 大马简体 | zh-MY
| SC / 简 | Same as zh-CN
for now. |
Note: zh-TW
and zh-HK
are based on zh-Hant
. zh-CN
are based on zh-Hans
. Currently, zh-MO
shares the same conversion table with zh-HK
unless additonal rules / CGroups are applied; zh-MY
and zh-SG
shares the same conversion table withzh-CN
unless additional rules / CGroups are applied.
cargo bench
on Intel(R) Xeon(R) CPU @ 2.80GHz
(GitPod), without parsing inline conversion rules:
load zh2Hant time: [45.442 ms 45.946 ms 46.459 ms]
load zh2Hans time: [8.1378 ms 8.3787 ms 8.6414 ms]
load zh2TW time: [60.209 ms 61.261 ms 62.407 ms]
load zh2HK time: [89.457 ms 90.847 ms 92.297 ms]
load zh2MO time: [96.670 ms 98.063 ms 99.586 ms]
load zh2CN time: [27.850 ms 28.520 ms 29.240 ms]
load zh2SG time: [28.175 ms 28.963 ms 29.796 ms]
load zh2MY time: [27.142 ms 27.635 ms 28.143 ms]
zh2TW data54k time: [546.10 us 553.14 us 561.24 us]
zh2CN data54k time: [504.34 us 511.22 us 518.59 us]
zh2Hant data689k time: [3.4375 ms 3.5182 ms 3.6013 ms]
zh2TW data689k time: [3.6062 ms 3.6784 ms 3.7545 ms]
zh2Hant data3185k time: [62.457 ms 64.257 ms 66.099 ms]
zh2TW data3185k time: [60.217 ms 61.348 ms 62.556 ms]
zh2TW data55m time: [1.0773 s 1.0872 s 1.0976 s]
ZhConver{sion,ter}.php
of MediaWiki: zhconv-rs are just based on conversion tables listed in ZhConversion.php
. MediaWiki relies the PHP built-in function strtr
, which is inefficient. zhconv-rs ports some of the implementation of MediaWiki to supports the same conversion rule syntax with much more efficiency.All of these implementation shares the same leftmost-longest matching strategy. So conversion results should generally be the same given the same conversion tables.