简体   繁体   中英

group by in Matlab to find the value that resulted minimum similar to SQL

I have a dataset having columns a, b, c and d I want to group the dataset by a,b and find c such that d is minimum for each group I can do "group by" using 'grpstats" as :

grpstats(M,[M(:,1) M(:,2) ],{'min'});

I don't know how to find the value of M(:,3) that resulted the min in d

In SQL I suppose we use nested queries for that and use the primary keys. How can I solve it in Matlab?

Here is an example:

>> M =[4,1,7,0.3;
2,1,8,0.4;
2,1,9,0.2;
4,2,1,0.2;
2,2,2,0.6;
4,2,3,0.1;
4,3,5,0.8;
5,3,6,0.2;
4,3,4,0.5;]

>> grpstats(M,[M(:,1) M(:,2)],'min')
ans =

2.0000    1.0000    8.0000    0.2000
2.0000    2.0000    2.0000    0.6000
4.0000    1.0000    7.0000    0.3000
4.0000    2.0000    1.0000    0.1000
4.0000    3.0000    4.0000    0.5000
5.0000    3.0000    6.0000    0.2000

But M(1,3) and M(4,3) are wrong. The correct answer that I am looking for is:

2.0000    1.0000    9.0000    0.2000
2.0000    2.0000    2.0000    0.6000
4.0000    1.0000    7.0000    0.3000
4.0000    2.0000    3.0000    0.1000
4.0000    3.0000    4.0000    0.5000
5.0000    3.0000    6.0000    0.2000

To conclude, I don't want the minimum of third column; but I want it's values corresponding to minimum in 4th column

I believe that

temp = grpstats(M(:, [1 2 4 3]),[M(:,1) M(:,2) ],{'min'});
result = temp(:, [1 2 4 3]);

would do what you require. If it doesn't, please explain in the comments and we can figure it out...

If I understand the documentation correctly, even

temp = grpstats(M(:, [1 2 4 3]), [1 2], {'min'});
result = temp(:, [1 2 4 3]);

should work (giving column numbers rather than full contents of columns)... Can't test right now, so can't vouch for that.

grpstats won't do this, and MATLAB doesn't make it as easy as you might hope.

Sometimes brute force is best, even if it doesn't feel like great MATLAB style:

[b,m,n]=unique(M(:,1:2),'rows');
for i =1:numel(m)
    idx=find(n==i);
    [~,subidx] = min(M(idx,4));
    a(i,:) = M(idx(subidx),3:4);
end

>> [b,a]
ans =
        2            1            9          0.2
        2            2            2          0.6
        4            1            7          0.3
        4            2            3          0.1
        4            3            4          0.5
        5            3            6          0.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM