corpus.py 文件源码

python
阅读 49 收藏 0 点赞 0 评论 0

项目:minke 作者: DistrictDataLabs 项目源码 文件源码
def words(self, fileids=None, categories=None):
        """
        Uses the built in word tokenizer to extract tokens from sentences.
        Note that this method uses BeautifulSoup to parse HTML content.
        """
        for sentence in self.sents(fileids, categories):
            for token in self._word_tokenizer.tokenize(sentence):
                yield token
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号