utils.py 文件源码

python
阅读 22 收藏 0 点赞 0 评论 0

项目:tRECS 作者: TeeOhh 项目源码 文件源码
def nonenglish(string):
    # '''Description: This function takes in the string of descriptions and return the string with nonenglish words removed (useful for course syllabi)
    #   Parameters: String of descriptions
    #   Output: the string with nonenglish words removed'''
    words = set(nltk.corpus.words.words())
    result=[w for w in nltk.wordpunct_tokenize(string) if w.lower() in words]
    return " ".join(result)
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号