word_cluster.py 文件源码

python

阅读 39 收藏 0 点赞 0 评论 0

项目：PolBotCheck 作者: codeforfrankfurt 项目源码文件源码

def calc_frequencies(words, words_n=50, lang='german'):
    words = [word for word in words if len(word) > 1]
    words = [word for word in words if not word.isnumeric()]
    words = [word.lower() for word in words]
    # words = [word for word in words if word not in all_stopwords]
    # Stemming words seems to make matters worse, disabled
    # stemmer = nltk.stem.snowball.SnowballStemmer(lang)
    # words = [stemmer.stem(word) for word in words]

    fdist = nltk.FreqDist(words)
    return fdist.most_common(words_n)

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号