tfidf.py 文件源码

python
阅读 28 收藏 0 点赞 0 评论 0

项目:newsgraph 作者: exchez 项目源码 文件源码
def calculate_query_bin_bits(tfidf): #this also needs to return the table from redis as well as the bin id
    table = str2int( ujson.loads( r.get('table') ) )
    dim = int( r.get('dim') )
    mapping = ujson.loads( r.get('map') )
    mapping = pd.DataFrame({'word': mapping})
    num_vectors = 16

    words = list(tfidf.keys())
    values = list(tfidf.values())
    tfidf_df = pd.DataFrame({'word': words, 'value': values})

    article_representation = pd.merge(mapping, tfidf_df, on='word', how='left').fillna(0)['value']

    bin_vectors = generate_random_vectors(num_vectors, dim)
    powers_of_two = 1 << np.arange(num_vectors-1, -1, -1)
    query_bin_bits = (article_representation.dot(bin_vectors) >= 0)

    return query_bin_bits, table
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号