简体   繁体   English

如何在模糊聚类中显示“模糊性” Plot Python

[英]How to Show "Fuzziness" in Fuzzy Clustering Plot Python

I have a normally distributed 2D dataset, and I am using the fcmeans library to perform fuzzy c-means clustering.我有一个正态分布的 2D 数据集,我正在使用 fcmeans 库执行模糊 c 均值聚类。 I am able to plot the clusters with the red point indicating the cluster center.我能够 plot 具有指示集群中心的红点的集群。 However, I need to show a gradient where the fuzziness occurs.但是,我需要在出现模糊的地方显示一个渐变。 I am not sure how to implement this in Python and I haven't been able to find anything like it online.我不确定如何在 Python 中实现它,而且我一直无法在网上找到类似的东西。

from fcmeans import FCM

data = pd.read_csv("my_data.csv")

model = FCM(n_clusters=2) 
model.fit(data) 

cntrs = my_model.centers
hard_prediction_labels = my_model.predict(data)
soft_prediction_labels = my_model.soft_predict(data)

plt.scatter(data[:, 0], data[:, 1], c=hard_prediction_labels, s=30);

I believe that my mistake is coming from the fact that my labels are 1 or 0;我认为我的错误是因为我的标签是 1 或 0; however, I'm not sure how to define it in a way that would allow me to determine which points are borderline.但是,我不确定如何以允许我确定哪些点是边界线的方式来定义它。 I am able to obtain the probabilities from the soft prediction (as to the probability of the data point belonging to each cluster) using the soft_predict() function, but I am unsure of how to create a color gradient with it.我可以使用 soft_predict() function 从软预测中获得概率(关于数据点属于每个簇的概率),但我不确定如何用它创建颜色渐变。

Let's say that you have computed the (n_points, n_clusters) array prob , that for n_clusters being 2, looks like假设您已经计算了(n_points, n_clusters)数组prob ,对于n_clusters为 2,看起来像

print(prob)
# 0.61 0.39
# 0.55 0.45
# .... ....
# 0.07 0.93

Then you can do the following, to have a point that is more opaque when you have a higher probability of belonging to its cluster, and more transparent when it has a lower probabilty.然后你可以执行以下操作,当你属于它的集群的概率更高时,有一个更不透明的点,而当它属于它的概率更低时,它更透明。 I think this is what you want...我想这就是你想要的......

n_clusters = 2
...
for cluster in range(n_clusters):
    plt.scatter(data[hard_prediction_labels==cluster, 0],
                data[hard_prediction_labels==cluster, 1],
                s=30,
                alpha=0.8*prob[hard_prediction_labels==cluster, cluster])
    plt.scatter(data[hard_prediction_labels==cluster, 0],
                data[hard_prediction_labels==cluster, 1],
                s=5,
                alpha=1.0)

NB I haven't fcmeans so my code is untested, and is essentially based on informed speculation.注意,我没有fcmeans ,所以我的代码未经测试,基本上是基于知情的推测。 You may need to adjust something to make it work.您可能需要进行一些调整才能使其正常工作。


EDIT编辑

plotting twice each data cluster, with different sizes and transparencies, possibly give you a better visualization of your data.每个数据集群绘制两次,具有不同的大小和透明度,可能会让您更好地可视化数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM