article_language.py 文件源码

python
阅读 25 收藏 0 点赞 0 评论 0

项目:fake_news 作者: bmassman 项目源码 文件源码
def calculate_languages_ratios(text):
    """
    Compute per language included in nltk number of unique stopwords appearing
    in analyzed text.
    """
    languages_ratios = {}
    tokens = wordpunct_tokenize(text)
    words = {word.lower() for word in tokens}
    for language in stopwords.fileids():
        stopwords_set = set(stopwords.words(language))
        common_elements = words & stopwords_set
        languages_ratios[language] = len(common_elements)
    return languages_ratios
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号