简体   繁体   English

在Matlab中分组以查找产生最小值的值,类似于SQL

[英]group by in Matlab to find the value that resulted minimum similar to SQL

I have a dataset having columns a, b, c and d I want to group the dataset by a,b and find c such that d is minimum for each group I can do "group by" using 'grpstats" as : 我有一个具有a,b,c和d列的数据集,我想按a,b分组该数据集,并找到c,使得每个组的d最小,我可以使用'grpstats'进行“ group by”:

grpstats(M,[M(:,1) M(:,2) ],{'min'}); grpstats(M,[M(:,1)M(:,2)],{'min'});

I don't know how to find the value of M(:,3) that resulted the min in d 我不知道如何找到导致d中的最小值的M(:,3)的值

In SQL I suppose we use nested queries for that and use the primary keys. 在SQL中,我想为此使用嵌套查询并使用主键。 How can I solve it in Matlab? 如何在Matlab中解决?

Here is an example: 这是一个例子:

>> M =[4,1,7,0.3;
2,1,8,0.4;
2,1,9,0.2;
4,2,1,0.2;
2,2,2,0.6;
4,2,3,0.1;
4,3,5,0.8;
5,3,6,0.2;
4,3,4,0.5;]

>> grpstats(M,[M(:,1) M(:,2)],'min')
ans =

2.0000    1.0000    8.0000    0.2000
2.0000    2.0000    2.0000    0.6000
4.0000    1.0000    7.0000    0.3000
4.0000    2.0000    1.0000    0.1000
4.0000    3.0000    4.0000    0.5000
5.0000    3.0000    6.0000    0.2000

But M(1,3) and M(4,3) are wrong. 但是M(1,3)和M(4,3)是错误的。 The correct answer that I am looking for is: 我正在寻找的正确答案是:

2.0000    1.0000    9.0000    0.2000
2.0000    2.0000    2.0000    0.6000
4.0000    1.0000    7.0000    0.3000
4.0000    2.0000    3.0000    0.1000
4.0000    3.0000    4.0000    0.5000
5.0000    3.0000    6.0000    0.2000

To conclude, I don't want the minimum of third column; 总而言之,我不要第三栏的最小值。 but I want it's values corresponding to minimum in 4th column 但我希望它的值对应于第四栏中的最小值

I believe that 我相信

temp = grpstats(M(:, [1 2 4 3]),[M(:,1) M(:,2) ],{'min'});
result = temp(:, [1 2 4 3]);

would do what you require. 会满足您的要求。 If it doesn't, please explain in the comments and we can figure it out... 如果没有,请在评论中说明,我们可以找出答案...

If I understand the documentation correctly, even 如果我正确理解文档,甚至

temp = grpstats(M(:, [1 2 4 3]), [1 2], {'min'});
result = temp(:, [1 2 4 3]);

should work (giving column numbers rather than full contents of columns)... Can't test right now, so can't vouch for that. 应该工作(给出列号而不是列的全部内容)...目前无法测试,因此无法保证。

grpstats won't do this, and MATLAB doesn't make it as easy as you might hope. grpstats不会做到这一点,而且MATLAB并没有像您希望的那样使它变得如此简单。

Sometimes brute force is best, even if it doesn't feel like great MATLAB style: 有时蛮力是最好的,即使它看起来不像是很棒的MATLAB风格:

[b,m,n]=unique(M(:,1:2),'rows');
for i =1:numel(m)
    idx=find(n==i);
    [~,subidx] = min(M(idx,4));
    a(i,:) = M(idx(subidx),3:4);
end

>> [b,a]
ans =
        2            1            9          0.2
        2            2            2          0.6
        4            1            7          0.3
        4            2            3          0.1
        4            3            4          0.5
        5            3            6          0.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM