LogClustering_HDFS.py 文件源码

python
阅读 20 收藏 0 点赞 0 评论 0

项目:loglizer 作者: logpai 项目源码 文件源码
def clustering(partSeqList, partData):
    print('clustering for the seperated dataset')
    #simiMatrix = simiMatrixCal(partData)
    '''Invoke the clustering method in library'''
    data_dist = pdist(partData,metric=distCalculate)
    Z = linkage(data_dist, 'complete')
    clusterLabels = fcluster(Z, para['max_d'], criterion='distance')
    print ('there are altogether %d clusters in this initial clustering'%(len(np.unique(clusterLabels))))
    clusNum = len(set(clusterLabels))
    instIndexPerClus=[[] for i in range(clusNum)]  #initialization
    for i in range(len(clusterLabels)):
        lab = clusterLabels[i]-1
        instIndexPerClus[lab].append(partSeqList[i])
    return clusterLabels,instIndexPerClus
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号