简体   繁体   English

对 python 中的聚类分类数据执行多维缩放 (MDS)

[英]Perform Multi-Dimension Scaling (MDS) for clustered categorical data in python

I am currently working on clustering categorical attributes that come from a bank marketing dataset from Kaggle.我目前正在对来自 Kaggle 的银行营销数据集的分类属性进行聚类。 I have created the three clusters with kmodes:我用 kmodes 创建了三个集群:

Output: cluster_df Output: cluster_df

Now I want to visualize each row of a cluster as a projection or point so that I get some kind of image:现在我想将集群的每一行可视化为投影或点,以便获得某种图像:

Desired visualization所需的可视化

I am having a hard time with this.我很难做到这一点。 I don't get a Euclidean distance with categorized data, right?我没有得到分类数据的欧几里得距离,对吧? That makes no sense.这是没有意义的。 Is there then no possibility to create this desired visualization?那么是否有可能创建这种所需的可视化?

The best way to visualize clusters is to use PCA.可视化集群的最佳方法是使用 PCA。 You can use PCA to reduce the multi-dimensional data into 2 dimensions so that you can plot and hopefully understand the data better.您可以使用 PCA 将多维数据减少为 2 维,以便您可以 plot 并希望更好地理解数据。 To use it see the following code:要使用它,请参阅以下代码:

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])

where x is the fitted and transformed data on your cluster.其中 x 是集群上的拟合和转换数据。 Now u can easily visualize your clustered data since it's 2 dimensional.现在您可以轻松地可视化您的集群数据,因为它是二维的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM