简体   繁体   English

计算 Octave 中行组的列的平均值

[英]Compute mean of columns for groups of rows in Octave

I have a matrix, for example:我有一个矩阵,例如:

1 2
3 4
4 5

And I also have a rule of grouping the rows, which is defined as a vector of group IDs like this:而且我还有一个对行进行分组的规则,它被定义为一个组 ID 的向量,如下所示:

1
2
1

Which means that the first and the third rows belong to the same group (ID 1) and the second row belong to another group (ID 2).这意味着第一行和第三行属于同一组(ID 1),第二行属于另一个组(ID 2)。 So, I would like to compute the mean value for each group.所以,我想计算每个组的平均值。 Here is the result for my example:这是我的示例的结果:

2.5 3.5
3 4

More formally, there is a matrix A of size ( m , n ), a number of groups k and a vector v of size ( m , 1), values of which are integers in range from 1 to k .更正式地说,有一个大小为 ( m , n ) 的矩阵A 、多个组k和一个大小为 ( m , 1) 的向量v ,其值是从 1 到k范围内的整数。 The result is a matrix R of size ( k , n ), where each row with index r corresponds to the mean value of the group r .结果是大小为 ( k , n ) 的矩阵R ,其中索引为r的每一行对应于组r的平均值。

Here is my solution (which does what I need) using for-loop in Octave:这是我在 Octave 中使用 for-loop 的解决方案(可以满足我的需要):

R = zeros(k, n);
for r = 1:k
    R(r, :) = mean(A((v == r), :), 1);
end

I wonder whether it could be vectorized.我想知道它是否可以矢量化。 So, what I need is to replace the for-loop with a vectorized solution, which is going to be much more efficient than the iterative one.因此,我需要用矢量化解决方案替换 for 循环,这将比迭代解决方案更有效。

Here is one of my many attempts (which do not work) to solve the problem in a vectorized way:这是我以矢量化方式解决问题的众多尝试之一(不起作用):

R = mean(A((v == 1:k), :);

As long as our data is of floating point, you can just do it manually by doing the sum yourself and then divide, by making use of accumdim .只要我们的数据是浮点数,您就可以自己手动进行求和,然后使用accumdim进行除法。 Like so:像这样:

octave:1> A = [1 2; 3 4; 4 5];
octave:2> subs = [1; 2; 1];
octave:3> accumdim (subs, A) ./ accumdim (subs, ones (rows (subs), 1))
ans =

   2.5000   3.5000
   3.0000   4.0000

You can consider it as a matrix multiplication problem.您可以将其视为矩阵乘法问题。 For instance, for your example this corresponds to例如,对于您的示例,这对应于

A = [1 2; 3 4; 4 5];
B = [0.5,0,0.5;0,1,0];

C = B*A

The main issue, is to construct B from your list of indicies in an efficient manner.主要问题是以有效的方式从您的指标列表中构建B My suggestion is to use the implicit expansion of == .我的建议是使用==的隐式扩展。

A = [1 2; 3 4; 4 5]; % Input data
idx = [1;2;1]; % Input Grouping

k = 2; % number of groups, ( = max(idx) )
m = 3; % Number of "observations"
Btmp = (idx == 1:k)'; % Mark locations
B = Btmp ./sum(Btmp,2); % Normalise
C = B*A

C =

    2.5000    3.5000
    3.0000    4.0000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM