简体   繁体   中英

Get all two-factor products of all matrix' rows columns

I have an nxm matrix. Let denote some row of this matrix as x. Each row represents a number of features x1, x2, x3, ...

Now, I would like to receive the elements above the diagonal of x * x' , namely: x1*x2, x1*x3, x2*x3 , ... but not x1*x1 . Also, if I have x1*x2 , I do not need x2*x1 .

I want to add these columns with products to my matrix. Given that I have had m columns before, I will then have additional columns for these products, namely: (m^2 + m)/2 - m more columns.

This should be accomplished for each row of my matrix.

I have found a solution in Matlab already. However, it seems to be very slow and I am wondering whether there is a more vectorized solution available which Matlab could execute faster.

My current solution uses a package to get a vector of the elements above the upper diagonal: https://de.mathworks.com/matlabcentral/fileexchange/23391-triangular-and-diagonal-indexing

M(itriu(size(M),1)) of a matrix M will give me the elements of my matrix above the diagonal. For example, if I throw in [1 2 3; 4 5 6; 7 8 9] [1 2 3; 4 5 6; 7 8 9] [1 2 3; 4 5 6; 7 8 9] I will get 2 3 6 as result.

My code is as follows:

function [ X_out ] = permutateFeatures( X_in )
%PERMUTATEFEATURES given a matrix with m features in the columns
% and n samples in the rows, return a [n (m^2 + m)/2] matrix
% where each additional column contains a element-wise product of two of
% the original columns

n = size(X_in, 1);
m = size(X_in, 2);

X_out = [X_in zeros(n, (m^2 + m)/2 - m)];

for i = 1:n
    outerProduct = X_out(i,1:m)' * X_out(i,1:m);
    X_out(i,:) = [X_in(i,:) outerProduct(itriu(size(outerProduct),1))'];
end

end

Is there a more efficient solution?

Here's a vectorized solution -

[r,c] = find(triu(true(size(X_in,2)),1));
out = [X_in X_in(:,r).*X_in(:,c)];

Runtime test

Timing code -

% Setup input array 
% (as stated in comments : m is mostly <20. n goes into the millions)
X_in = randi(5,[50000,20]);

disp('--------------------------- Original Solution')
tic,
n = size(X_in, 1);
m = size(X_in, 2);
X_out = [X_in zeros(n, (m^2 + m)/2 - m)];
for i = 1:n
    outerProduct = X_out(i,1:m)' * X_out(i,1:m);
    X_out(i,:) = [X_in(i,:) outerProduct(itriu(size(outerProduct),1))'];
end
toc

disp('--------------------------- Proposed Solution')
tic,
[r,c] = find(triu(true(size(X_in,2)),1));
out = [X_in X_in(:,r).*X_in(:,c)];
toc,

Timings -

--------------------------- Original Solution
Elapsed time is 8.618389 seconds.
--------------------------- Proposed Solution
Elapsed time is 0.131146 seconds.

Huge speedups there of 60x+ !

Here the matrix multiplication is vectorized which is the far bigger part of the calculation. If you want to you can vectorize the creation of vec1 and vec2 as well, but there is only little more effiency to get:

vec1=[];
vec2=[];
for i = 1:n
    vec1=[vec1 i*ones(1,n-i)];
    vec2=[vec2 (i+1):n];
end
X_out2=[X_in X_in(:,vec1).*X_in(:,vec2)];

for a rand(1000,1000) the old approach and this one perform

Elapsed time is 24.709988 seconds.
Elapsed time is 6.753230 seconds.

on my machine, with the same solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM