ContextExtractor.py 文件源码

python

阅读 19 收藏 0 点赞 0 评论 0

项目：quetch 作者: juliakreutzer 项目源码文件源码

def corpus2dict(corpusfiles):
    """ From a given corpus, create a gensim dictionary for mapping words to ints """
    corpus = list()
    corpus.append(["PADDING"]) #has word index 0
    corpus.append(["UNKNOWN"]) #has word index 1
    for cf in corpusfiles:
        #print "INFO: corpus = %s"%(corpusfiles)
        if cf is not None: #source can be none
            corpus.extend(preprocess(codecs.open(cf,"r","utf8").readlines()))
    wordDictionary = corpora.Dictionary(corpus)
    return wordDictionary

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号