tokenizer.py 文件源码

python

阅读 32 收藏 0 点赞 0 评论 0

项目：quackalike 作者: gumblex 项目源码文件源码

def cutandsplit(s):
    for ln in filterlist(splitsentence(stripblank(s))):
        l = RE_BRACKETS.sub(brcksub, ln.strip())
        if notchinese(l):
            continue
        yield ' '.join(cut(l.replace('?', '“').replace('?', '”').replace('?', '‘').replace('?', '’').lstrip(tailpunct).rstrip(headpunct)))

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号