包含常见中文文本处理的Python库:Zhon
jopen
10年前
Zhon这个Python库提供了常用汉字常量,如CJK字符和偏旁,中文标点,拼音,和汉字正则表达式(如找到文本中的繁体字):
- CJK字符和偏旁
- Chinese punctuation marks
- Chinese sentence regular expression pattern
- Pinyin vowels, consonants, lowercase, uppercase, and punctuation
- Pinyin syllable, word, and sentence regular expression patterns
- Zhuyin characters and marks
- Zhuyin syllable regular expression pattern
- CC-CEDICT characters
>>> re.findall(zhon.hanzi.sentence, '我买了一辆车。妈妈做的菜,很好吃!') ['我买了一辆车。', '妈妈做的菜,很好吃!']