utils.py 文件源码

python
阅读 24 收藏 0 点赞 0 评论 0

项目:stochasticLDA 作者: qlai 项目源码 文件源码
def parseDocument(doc, vocab):
    wordslist = list()
    countslist = list()
    doc = doc.lower()
    tokens = wordpunct_tokenize(doc)

    dictionary = dict()
    for word in tokens:
        if word in vocab:
            wordtk = vocab[word]
            if wordtk not in dictionary:
                dictionary[wordtk] = 1
            else:
                dictionary[wordtk] += 1

    wordslist.append(dictionary.keys())
    countslist.append(dictionary.values())
    return (wordslist[0], countslist[0])
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号