load_data.py 文件源码

python
阅读 33 收藏 0 点赞 0 评论 0

项目:TopicModel 作者: BUAAQingYuan 项目源码 文件源码
def load_corpus(data_file):
    texts = load_texts(data_file)
    # remove words that appear only once
    frequency = defaultdict(int)
    for text in texts:
        for token in text:
            frequency[token] += 1
    texts = [[token for token in text if frequency[token] > 1] for text in texts]
    dictionary = corpora.Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]
    corpus = [[token[0] for token in text] for text in corpus]
    return corpus, dictionary
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号