Tokenizer.py 文件源码

python
阅读 30 收藏 0 点赞 0 评论 0

项目:nlp_sum 作者: Zhujunnan 项目源码 文件源码
def english_sentence_segment(text):
    """segment text into sentence"""
    try:
        sent_detector = nltk.data.load(
            'tokenizers/punkt/english.pickle'
        )

        extra_abbrev = ["e.g", "al", "i.e"]
        sent_detector._params.abbrev_types.update(extra_abbrev)
        return sent_detector.tokenize(text)
    except LookupError as e:
        raise LookupError(
            "NLTK tokenizers are missing. Download them by following command: "
            '''python -c "import nltk; nltk.download('punkt')"'''
        )
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号