get_centroid.py 文件源码

python

阅读 33 收藏 0 点赞 0 评论 0

项目：BioIR 作者: nlpaueb 项目源码文件源码

def get_centroid_idf(text, emb, idf, stopwords, D):
    # Computing Terms' Frequency
    tf = defaultdict(int)
    tokens = bioclean(text)
    for word in tokens:
        if word in emb and word not in stopwords:
            tf[word] += 1

    # Computing the centroid
    centroid = np.zeros((1, D))
    div = 0

    for word in tf:
        if word in idf:
            p = tf[word] * idf[word]
            centroid = np.add(centroid, emb[word]*p)
            div += p
    if div != 0:
        centroid = np.divide(centroid, div)
    return centroid

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号