如何在Python中将dict的值聚类？

Question

Basically, I have a dict in Python with string keys and arrays of ints as values. 基本上，我在Python中使用字符串键和整数数组作为值的字典。

dict = {"Option1Results" : [4, 1, 5, 2, 4],
        "Option2Results" : [11, 44, 2, 1, 5],
        ....
        }

I would like to implement hierarchical clustering on this dict based on the intersection of the values. 我想基于值的交集在此dict上实现分层聚类。 For example, let's say Option1Results and Option4Results share about 70% of the same integers, then cluster them together. 例如，假设Option1Results和Option4Results共享大约70％的相同整数，然后将它们聚在一起。 Is there a way to go about this other than looping through the dictionary and comparing the values one by one? 除了遍历字典并逐一比较值之外，还有其他方法吗？

Answer 1

I think you could utilize two functions cosine similarity and kmeans 我认为您可以利用余弦相似度和kmeans两个函数

cosine similarity: 余弦相似度：

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. 余弦相似度是度量内部乘积空间的两个非零向量之间相似度的度量，该向量测量两个向量之间的夹角余弦。
https://en.wikipedia.org/wiki/Cosine_similarity https://en.wikipedia.org/wiki/Cosine_similarity

data = {'Option{}Results'.format(i):[ random.randint(1,100) for _ in range(5)] for i in range(100)}
pairwise.cosine_similarity(data.values()[0],data.values()[1])
array([[ 0.85988428]])

kmeans: k均值：

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k均值聚类是一种矢量量化方法，最初来自信号处理，在数据挖掘的聚类分析中很流行。 k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. k均值聚类旨在将n个观察值划分为k个聚类，其中每个观察值均属于具有最均值的聚类，作为聚类的原型。 This results in a partitioning of the data space into Voronoi cells. 这导致将数据空间划分为Voronoi单元。 https://en.wikipedia.org/wiki/K-means_clustering https://en.wikipedia.org/wiki/K-means_clustering

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5, random_state=0).fit(data.values())
kmeans.predict(data['Option70Results'])
array([2])

Answer 2

To find the intersection of the values of the given dict as a set: 要找到给定字典的值的交集：

intersection = set.intersection(*map(set, dict.values())

Hierarchical clustering can be achieved using scipy's linkage and fcluster. 可以使用scipy的链接和集群来实现分层聚类。 Hierarchical clustering using scipy is explained by this answer . 这个答案解释了使用scipy的层次聚类。

如何在Python中将dict的值聚类？

问题描述

2 个解决方案

解决方案1
0 2017-07-24 20:46:42

解决方案2
0 2017-07-24 21:11:36

如何在Python中将dict的值聚类？

问题描述

2 个解决方案

解决方案1 0 2017-07-24 20:46:42

解决方案2 0 2017-07-24 21:11:36

解决方案1
0 2017-07-24 20:46:42

解决方案2
0 2017-07-24 21:11:36