cccedict is a CC-CEDICT parser for parsing Chinese/English natural language dictionaries. It has the unique feature of supporting the cantonese.org extensions to the CC-CEDICT format, which adds support for jyutping pronunciations.
A CedictEntry
represents a single entry in a Cedict
:
``` use cccedict::cedict_entry::*;
let line = "你好嗎 你好吗 [ni3 hao3 ma5] {nei5 hou2 maa1} /how are you?/"; let entry = CedictEntry::new(line).unwrap();
asserteq!(entry.traditional, "你好嗎"); asserteq!(entry.simplified, "你好吗"); asserteq!(entry.pinyin, Some( vec![ Syllable::new("ni", "3"), Syllable::new("hao", "3"), Syllable::new("ma", "5"), ] )); asserteq!(entry.jyutping, Some( vec![ Syllable::new("nei", "5"), Syllable::new("hou", "2"), Syllable::new("maa", "1"), ] )); asserteq!(entry.definitions, Some(vec!["how are you?".tostring()])); ```
You can also instantiate a Cedict
from a FromStr
, Read
, or AsRef<Path>
implementor:
``` use cccedict::cedict::Cedict; use std::str::FromStr;
let cedict_entries = "\ 你嘅 你嘅 [ni3 ge2] {nei5 ge3} /your's (spoken)/ 你地 你地 [ni3 di4] {nei5 dei6} /you guys; you all/ 你好嗎 你好吗 [ni3 hao3 ma5] {nei5 hou2 maa1} /how are you?/";
let cedict = Cedict::fromstr(cedictentries).unwrap(); assert_eq!(cedict.entries.len(), 3);
let reader: &[u8] = cedictentries.asbytes(); let cedict = Cedict::fromfile(reader).unwrap(); asserteq!(cedict.entries.len(), 3);
use std::path::Path; let path = Path::new("fixtures/cccanto-test.txt"); let cedict = Cedict::frompath(path).unwrap(); asserteq!(cedict.entries.len(), 3); ```