[英]How to calculate the distance of each cluster in a scatter plot
I have 2 clusters plotted in a scatter plot and i need to find their standard deviation and distance from the center from one cluster to another.我在散点图 plot 中绘制了 2 个集群,我需要找到它们的标准偏差和从一个集群到另一个集群的中心距离。 I was not able to find any guide of documentation that simplifies the process of finding the center of 2 clusters for scatter plots, the reason is that i need to compare the scatter of each cluster with the distance of the centres of the clusters.
我找不到任何文档指南来简化为散点图查找 2 个集群的中心的过程,原因是我需要将每个集群的散布与集群中心的距离进行比较。 My actual scatter plot looks like this:
我的实际散点图 plot 如下所示:
import matplotlib.pyplot as plt
import numpy as np
vector1 = [
2.8238,
3.0284,
5.9333,
2.0156,
2.2467,
2.0092,
4.7983,
4.3554,
3.6372,
1.3159,
2.6174,
2.2336,
0.9625,
5.6285,
5.4040,
2.7887,
0,
3.4632,
0,
2.7370
]
vector5 = [
1.2994,
7.4469,
3.6503,
2.1667,
4.1975,
3.3006,
10.4082,
3.4112,
2.2395,
1.5653,
4.3237,
1.8679,
1.2622,
14.1372,
6.1686,
3.8903,
2.2873,
6.2559,
0.2132,
7.2303,
]
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.style.use('ggplot')
data = [vector1, std_colomns4]
plt.plot(vector1 , marker='.', linestyle='none', markersize=20, label='Vector 1')
plt.plot(vector5, marker='.', linestyle='none', markersize=20, label='Vector 5')
plt.xticks(range(1, 20, 1))
plt.yticks(range(1, 20, 1))
plt.ylabel('Sizes')
plt.xlabel('Index')
plt.legend()
plt.show()
For the sake of pre-visualization:为了预可视化:
You can compute the mean by converting them to arrays您可以通过将它们转换为 arrays 来计算平均值
vector1 = np.array([...])
vector5 = np.array([...])
mean1 = np.mean(vector1)
mean5 = np.mean(vector5)
# Rest of the code
plt.plot((vector1+vector5)/2, marker='x', linestyle='none', markersize=12, label='Mean')
plt.axhline(mean1)
plt.axhline(mean5, c='b')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.