简体   繁体   中英

Indexing duplicates in a matrix: Matlab

Consider a matrix

 X = [ 1 2 0 1; 
       1 0 1 2;                                          
       1 2 3 4;                                     
       2 4 6 8;
          .           
          .                          
       1 2 0 1                  
          .                 
          .    ]

I want to create a new column such that I can number ith occurence of every row.

Ans:

   X = [ 1 2 0 1;   y =  [1
         1 0 1 2;         1                                 
         1 2 3 4;         1                            
         2 4 6 8;         1
           .             .
           .             .             
         1 2 0 1          2        
           .             .    
           .    ]        .]

Any ideas?

How about this?

y = sum(triu(squareform(pdist(X))==0)).';

This works by counting how many previous rows are equal to each row. Two rows are equal if their distance (computed with squareform and pdist ) is 0. triu makes sure only previous rows are taken into account.

To reduce computation time and avoid depending on the Statistics Toolbox, you can use @user1735003's suggestion:

y = sum(triu((bsxfun(@plus, sum(X.^2,2), sum(X.^2,2)') - 2*X*X.')==0));

Approach #1

%// unique rows
unqrows = unique(X,'rows'); 

%// matches for each row against the unique rows and their cumsum values
matches_perunqrow = squeeze(all(bsxfun(@eq,X,permute(unqrows,[3 2 1])),2));
cumsum_unqrows = cumsum(matches_perunqrow,1);

%// Go through a row-order and get the cumsum values for the final output
[row,col] = find(matches_perunqrow);
[sorted_row,ind] = sort(row);
y=cumsum_unqrows(sub2ind(size(cumsum_unqrows),[1:size(cumsum_unqrows,1)]',col(ind)));

Sample run -

X =
     1     2     0     1
     1     0     1     2
     1     2     3     4
     2     4     6     8
     1     2     0     1
     1     2     3     4
     1     2     3     4
     1     2     3     4
     1     2     3     4
     1     2     0     1
out =
     1
     1
     1
     1
     2
     2
     3
     4
     5
     3

Approach #2

%// unique rows
unqrows = unique(X,'rows');

%// matches for each row against the unique rows
matches_perunqrow = all(bsxfun(@eq,X,permute(unqrows,[3 2 1])),2)

%// Get the cumsum of matches and select only the matches for each row.
%// Since we need to go through a row-order, transpose the result
cumsum_perrow = squeeze(cumsum(matches_perunqrow,1).*matches_perunqrow)' %//'

%// Select the non zero values for the final output
y = cumsum_perrow(cumsum_perrow~=0)

Approach #3

%// label each row based on their uniqueness
[~,~,v3] = unique(X,'rows')
matches_perunqrow = bsxfun(@eq,v3,1:size(X,1))

cumsum_unqrows = cumsum(matches_perunqrow,1);

%// Go through a row-order and get the cumsum values for the final output
[row,col] = find(matches_perunqrow);
[sorted_row,ind] = sort(row);
y=cumsum_unqrows(sub2ind(size(cumsum_unqrows),[1:size(cumsum_unqrows,1)]',col(ind)));

Approach #4

%// label each row based on their uniqueness
[~,~,match_row_id] = unique(X,'rows');

%// matches for each row against the unique rows and their cumsum values
matches_perunqrow = bsxfun(@eq,match_row_id',[1:size(X,1)]');
cumsum_unqrows = cumsum(matches_perunqrow,2);

%// Select the cumsum values for the ouput based on the unique matches for each row
y = cumsum_unqrows(matches_perunqrow);

A solution including a for loop can be done quite easily, maybe it is fast enough already. I am confident that there is a faster solution, which might makes use of cumsum , but maybe you do not even need it. The basic idea: find the indices of unique rows first, in order to be able to deal with scalar indices instead of full rows (vectors). Then loop over the indices and find number of previous occurences:

X = [ 1 2 0 1; 
   1 0 1 2;                                          
   1 2 3 4;                                     
   2 4 6 8;                        
   1 2 0 1;                 
   1 3 3 7;                 
   1 2 0 1];

[~,~,idx] = unique(X, 'rows'); %// find unique rows

%// loop over indices and accumulate number of previous occurences
y = zeros(size(idx));
for i = 1:length(idx)
   y(i) = sum(idx(1:i) == idx(i)); %// this line probably scales horrible with length of idx.
end

The result for the example is:

y =

 1
 1
 1
 1
 2
 1
 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM