简体   繁体   English

使用 SOM 聚类

[英]Clustering using SOM

I have a data set consisting from 6-dimensional data points.我有一个由 6 维数据点组成的数据集。 I want to produce a self-organizing map for this data to see how my data is clustered and how many different clusters are there in my dataset.我想为此数据生成一个自组织 map 以查看我的数据如何聚类以及我的数据集中有多少个不同的聚类。 My dataset is UNLABELED.我的数据集未标记。 And all the examples that I came across are all labelled(iris dataset).我遇到的所有示例都带有标签(虹膜数据集)。 I have used various python packages(minisom, sompy, susi) to implement SOM but I am unable to visualize and interpret those results.我使用了各种 python 包(minisom、sompy、susi)来实施 SOM,但我无法可视化和解释这些结果。

I would request this community to help me with this and I would really appreciate if you can provide a link to good work on >3 dimensional data based on SOM-clustering with proper evaluation of results.我会请求这个社区帮助我解决这个问题,如果你能提供一个链接,指向基于 SOM 聚类的 >3 维数据的良好工作,并对结果进行适当的评估,我将不胜感激。

MORE INFO:::::::::::更多信息:::::::::::

Thanks.谢谢。 I was able to understand the UMATRIX.我能够理解UMATRIX。 However, I am still struggling to cluster similar datapoints.但是,我仍在努力对相似的数据点进行聚类。

This is a sample of dataset:这是数据集的示例:

A      B          C        D            E            F
1   0.000613    150386  20.279685   39400220.0  0.672270
1   0.000649    154428  21.069894   8444300.0   0.466464
1   0.000276    154017  20.890017   12361590.0  0.399357
1   0.000186    68675   20.419599   13973180.0  0.430975
1   0.000177    60795   23.276564   5686630.0   0.372155

This is the result of the of the SOM clustering:这是 SOM 聚类的结果:

A      B             C      D          E       F      Cluster-id
5   1.096415e-07    274 12.599589   4870.0  0.000060    19
5   1.185185e-07    205 12.108413   10000.0 0.000402    19
5   1.131892e-07    221 12.282051   290.0   0.000014    19
5   1.447471e-07    338 12.708078   1750.0  0.000027    19
5   8.218939e-08    244 12.000000   30.0    0.000027    19
   ...  ... ... ... ... ... ... ... ...
5   2.425165e-08    26  12.517500   2020.0  0.000025    19
5   2.926305e-08    51  12.051724   2320.0  0.000012    19
5   2.326685e-08    18  11.724138   290.0   0.000009    19
5   2.465502e-08    18  12.288000   2500.0  0.000018    19
5   5.118597e-08    80  11.776271   2950.0  0.000093    19

If you look at the above result attribute C and attribute E are varying significantly as compared to other attributes even though they belong to the same cluster What is the plausible reason behind this?如果您查看上面的结果属性 C 和属性 E 与其他属性相比有很大差异,即使它们属于同一集群,这背后的合理原因是什么?

and How can I solve this with the aim to have a cluster with similar data points?????(FYI: I did standard scaling on the dataset to equalize the variance of each attribute)以及如何解决这个问题,目的是拥有一个具有相似数据点的集群?????(仅供参考:我对数据集进行了标准缩放以均衡每个属性的方差)

With susi , this works like the following (taken from susi/SOMClustering.ipynb ):使用susi时,其工作方式如下(取自susi/SOMClustering.ipynb ):

import susi
som = susi.SOMClustering()
som.fit(X) # <- X is your dataset without labels

# to get the clusters
clusters = som.get_clusters(X)

# to plot the clusters
plt.scatter(x=[c[1] for c in clusters], y=[c[0] for c in clusters], c=y, alpha=0.2)
plt.gca().invert_yaxis()
plt.show()

Does that work for you?那对你有用吗? If not, please give us more information about your data.如果没有,请向我们提供有关您的数据的更多信息。

Disclaimer: I am the developer of susi.免责声明:我是 susi 的开发者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM