preprocessing.py 文件源码

python
阅读 26 收藏 0 点赞 0 评论 0

项目:jack 作者: uclmr 项目源码 文件源码
def sort_by_tfidf(question, paragraphs):
    tfidf = TfidfVectorizer(strip_accents="unicode", stop_words=spacy.en.STOP_WORDS, decode_error='replace')
    try:
        para_features = tfidf.fit_transform(paragraphs)
        q_features = tfidf.transform([question])
    except ValueError:
        return [(i, 0.0) for i in range(len(paragraphs))]

    dists = pairwise_distances(q_features, para_features, "cosine").ravel()
    sorted_ix = np.lexsort((paragraphs, dists))  # in case of ties, use the earlier paragraph

    return [(i, 1.0 - dists[i]) for i in sorted_ix]
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号