简体   繁体   English

成对相似性和排序样本

[英]Pairwise Similarity and Sorting Samples

The following is a problem from an assignment that I am trying to solve: 以下是我要解决的作业中的问题:

Visualization of similarity matrix. 可视化相似度矩阵。 Represent every sample with a four-dimension vector (sepal length, sepal width, petal length, petal width). 用四维向量表示每个样本(间隔长度,萼片宽度,花瓣长度,花瓣宽度)。 For every two samples, compute their pair-wise similarity. 对于每两个样本,计算它们的成对相似度。 You may do so using the Euclidean distance or other metrics. 您可以使用欧几里德距离或其他指标来执行此操作。 This leads to a similarity matrix where the element (i,j) stores the similarity between samples i and j. 这导致了一个相似度矩阵,其中元素(i,j)存储了样本i和j之间的相似度。 Please sort all samples so that samples from the same category appear together. 请对所有样本进行排序,以使同一类别的样本一起出现。 Visualize the matrix using the function imagesc() or any other function. 使用函数imagesc()或任何其他函数可视化矩阵。

Here is the code I have written so far: 这是我到目前为止编写的代码:

load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
iris_distance = table2array(iris_copy); % convert the table to an array

% pairwise similarity
D = pdist(iris_distance); % calculate the Euclidean distance and store the result in D
W = squareform(D); % convert to squareform
figure()
imagesc(W); % visualize the matrix

Now, I think I've got the coding mostly right to answer the question. 现在,我认为我的编码基本上可以回答这个问题。 My issue is how to sort all the samples so that samples from the same category appear together because I got rid of the names when I created the copy. 我的问题是如何对所有样本进行排序,以使同一类别中的样本同时出现,因为在创建副本时我摆脱了名称。 Is it already sorted by converting to squareform? 它已经通过转换为正方形进行排序了吗? Other suggestions? 还有其他建议吗? Thank you! 谢谢!

It should be in the same order as the original data. 它应与原始数据的顺序相同。 While you could sort it afterwards, the easiest solution is to actually sort your data by class after line 2 and before line 3. 虽然您可以在之后对其进行排序,但最简单的解决方案是在第2行之后和第3行之前按类对数据进行实际排序。

load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
% Sort the table here on the "Class" attribute. Don't forget to change the table name
% in the next line too if you need to.
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table

Consider using sortrows: 考虑使用排序行:

tblB = sortrows(tblA,'RowNames') sorts a table based on its row names. tblB = sortrows(tblA,'RowNames')根据表的行名对表进行排序。 Row names of a table label the rows along the first dimension of the table. 表的行名称标记了沿表的第一维的行。 If tblA does not have row names, that is, if tblA.Properties.RowNames is empty, then sortrows returns tblA. 如果tblA没有行名,即tblA.Properties.RowNames为空,则sortrows返回tblA。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM