简体   繁体   English

我如何在Matlab中使用轮廓函数

[英]How do i use the silhouette function in matlab

I have a question on how to use silhouette function in matlab 我对如何在Matlab中使用轮廓函数有疑问

if i have my correlation matrix X = 90x90 and my cluster membership numbers for my data ; 如果我有我的相关矩阵X = 90x90和我的数据的集群成员数; say i have five clusters. 说我有五个集群。 This is defined as cidx which is length 90x1 each value is assigned a number from 1 to 5. 定义为cidx,其长度为90x1,每个值分配一个从1到5的数字。

Can I just pass the correlation matrix and cidx to the silhouette function and specify the measure as 'correlation' or should i be passing in my returns matrix instead? 我可以只将相关矩阵和cidx传递给Silhouette函数并将度量指定为“ correlation”,还是应该将其传递给我的收益矩阵?

Thanks for your help! 谢谢你的帮助!

First of all you need to make your clusters. 首先,您需要建立集群。 For example kmeans function in matlab does this for you. 例如,matlab中的kmeans函数可以为您完成此任务。

cidx = kmeans(X,2,'distance','Euclidean');

According to MATLAB: 根据MATLAB:

IDX = kmeans(X,k) partitions the points in the n-by-p data matrix X into k clusters. IDX = k均值(X,k)的划分的n乘p个数据矩阵X的点分成k个簇。 This iterative partitioning minimizes the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances. 此迭代分区在所有群集上将点到群集质心距离的群集内总和最小化。 Rows of X correspond to points, columns correspond to variables. X的行对应于点,列的对应于变量。 kmeans returns an n-by-1 vector IDX containing the cluster indices of each point. kmeans返回包含每个点的聚簇索引的n×1向量IDX

so here cidx is the n-by-1 cluster indices. 所以cidxn乘1的簇索引。 After finding the indices you can pass the X and the cidx to the silhouette function: 找到后,指数就可以通过XCIDX剪影功能:

s = silhouette(X,cidx,'Euclidean')

s is the silhouette values in the n-by-1 vector. s是n×1向量中的轮廓值。

Silhouette is used to determine the quality of clustering. 轮廓用于确定聚类的质量。 The way this function works is illustrated below using a matrix of 100*3 size. 下面使用100 * 3大小的矩阵说明了此函数的工作方式。 Example - 范例-

NofClusters=3;

numObservarations = 100;
dimensions = 3;
data = rand([numObservarations dimensions]);
numObservarations = length(data);

%% cluster
opts = statset('MaxIter', 500, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...
    'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);
%% plot data+clusters
figure, hold on
scatter3(data(:,1),data(:,2),data(:,3), 50, clustIDX, 'filled')
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 200, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y'), zlabel('z')

%% plot clusters quality
figure
[silh,h] = silhouette(data, clustIDX);
avrgScore = mean(silh);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM