[英]How to cluster the values of a dict in Python?
Basically, I have a dict in Python with string keys and arrays of ints as values. 基本上,我在Python中使用字符串键和整数数组作为值的字典。
dict = {"Option1Results" : [4, 1, 5, 2, 4],
"Option2Results" : [11, 44, 2, 1, 5],
....
}
I would like to implement hierarchical clustering on this dict based on the intersection of the values. 我想基于值的交集在此dict上实现分层聚类。 For example, let's say Option1Results and Option4Results share about 70% of the same integers, then cluster them together.
例如,假设Option1Results和Option4Results共享大约70%的相同整数,然后将它们聚在一起。 Is there a way to go about this other than looping through the dictionary and comparing the values one by one?
除了遍历字典并逐一比较值之外,还有其他方法吗?
I think you could utilize two functions cosine similarity and kmeans 我认为您可以利用余弦相似度和kmeans两个函数
cosine similarity: 余弦相似度:
Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.
余弦相似度是度量内部乘积空间的两个非零向量之间相似度的度量,该向量测量两个向量之间的夹角余弦。
https://en.wikipedia.org/wiki/Cosine_similarityhttps://en.wikipedia.org/wiki/Cosine_similarity
data = {'Option{}Results'.format(i):[ random.randint(1,100) for _ in range(5)] for i in range(100)}
pairwise.cosine_similarity(data.values()[0],data.values()[1])
array([[ 0.85988428]])
kmeans: k均值:
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.
k均值聚类是一种矢量量化方法,最初来自信号处理,在数据挖掘的聚类分析中很流行。 k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
k均值聚类旨在将n个观察值划分为k个聚类,其中每个观察值均属于具有最均值的聚类,作为聚类的原型。 This results in a partitioning of the data space into Voronoi cells.
这导致将数据空间划分为Voronoi单元。 https://en.wikipedia.org/wiki/K-means_clustering
https://en.wikipedia.org/wiki/K-means_clustering
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5, random_state=0).fit(data.values())
kmeans.predict(data['Option70Results'])
array([2])
To find the intersection of the values of the given dict as a set: 要找到给定字典的值的交集:
intersection = set.intersection(*map(set, dict.values())
Hierarchical clustering can be achieved using scipy's linkage and fcluster. 可以使用scipy的链接和集群来实现分层聚类。 Hierarchical clustering using scipy is explained by this answer .
这个答案解释了使用scipy的层次聚类。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.