简体   繁体   English

用于3D矩阵的Matlab Accumarray

[英]Matlab Accumarray for 3D matrix

I really wish the documentation was more clear on the use of accumarray for ND size matrices in Matlab. 我真的希望文档在Matlab中将Accumarray用于ND尺寸矩阵时更加清晰。 Regardless, I'm totally confused here and looking for some advice. 无论如何,我在这里很困惑,正在寻找一些建议。 Here's the challenge: 这是挑战:

I have a 3D matrix of data. 我有一个3D数据矩阵。

  • ROWS are individual rivers ROWS是个别河流
  • COLUMNS are dates of observations 栏是观测日期
  • PAGES are the time intervals of data collection PAGES是数据收集的时间间隔

For this example, let's assume each observation is volume of water flowing through the meter in a 5 minute interval. 对于此示例,我们假设每个观测值都是每5分钟间隔流经仪表的水量。 However, I now wish to resample the data to intervals of N minutes (obviously multiples of 5). 但是,我现在希望将数据重新采样为N分钟的间隔(显然是5的倍数)。 Let's choose NMINS = 15. 让我们选择NMINS = 15。

So, what I want to do is find the sum or mean of the volume of water over NMINS minute intervals. 因此,我想做的是找出NMINS分钟间隔内的水量总和。 That is, the ROWS and COLUMNS will not change, but the dimensions and values for the PAGES will be decimated/aggregated. 也就是说,ROWS和COLUMNS不会更改,但是PAGES的尺寸和值将被抽取/汇总。

I can get the grouping values for the pages. 我可以获取页面的分组值。 That is, the values I need to group by. 也就是说,我需要分组的值。 If it was a single river for a single day, no problem. 如果一天只有一条河,那没问题。 But, I have hundreds of days and dozens of rivers. 但是,我有数百天和数十条河流。

I've gotten this far: 我已经走了这么远:

CREATE TEST TIME VECTOR 创建测试时间矢量

NMINS   = 15; % Bucket by every 15 mins or 20, etc.
N5MINS  = 5 * 12 * 24 * 2; % Keep small - Two days of 5 min datenums
dnums   = datenum(2016,3,20,0,0:5:N5MINS,0);
% Trim dnums to start at random time for edge case and keep only mins
mins    = rem(dnums(25:end-30),1)';    % Create column vector

CREATE RANDOM MATRIX FOR TESTING 创建用于测试的随机矩阵

rng(123); % Set seed for reproducibility
X       = randi(100,12,9,length(mins)); % Test matrix

FIND TIMES IN TERMS OF MINUTES 以分钟为单位查找时间

[~,~,~,H,M] = datevec( mins );
H       = 60 .* (H - H(1));

NOW FIND ALL TIMES THAT CORRESPOND TO OUR BUCKET 现在查找与我们的桶相对应的所有时间

idxMIN  = mod( M+H, NMINS )==0;
idxNewP = idxMIN;          % This is used to align the new river matrix
[R,C,P] = size(X);         % We'll drop P and use newP
newP    = sum(idxNewP); % Number of PAGES in final matrix (new)
% Final newX will have dimensions [R,C,newP]

RESET THE GROUPING INDICES 重设分组索引

% Must shift one all as minute intervals represent data UP to that point
% for the actual grouping of data. Test if first bucket is a 
% match and set accordingly
if idxMIN(1)
    idxMIN = [1; idxMIN(1:end-1)]; 
    subs = cumsum(idxMIN);
else 
    idxMIN = [0; idxMIN(1:end-1)];
    subs = cumsum(idxMIN) + 1 ;
end

ADDITION: The group size will not be consistent and I cannot make that assumption (sadly). 补充:小组人数将不一致,我不能(很难)做出这种假设。 Consider the following after running the above. 运行以上步骤后,请考虑以下内容。

tsttbl = table();
tsttbl.dnumstr = datestr(mins(1:5));
tsttbl.Mins    = M(1:5);
tsttbl.subs    = subs(1:5);
tsttbl

Output shows first group is a size of 1: 输出显示第一组的大小为1:

tsttbl = 

    dnumstr     Mins    subs
    ________    ____    ____

     2:00 AM     0      1   
     2:05 AM     5      2   
     2:10 AM    10      2   
     2:15 AM    15      2   
     2:20 AM    20      3  

Last, ultimately, I'll need to pass custom functions. 最后,最终,我需要传递自定义函数。 The above is a toy example to illustrate the problem quickly. 上面是一个玩具示例,可以快速说明问题。 My apologies for not being more clear. 抱歉,我不清楚。

END ADDITION 结束添加

And this is where I stumble... 这就是我绊倒的地方...

How do I set the subs values to apply along each page to use accumarray? 如何设置要在每个页面上应用的subs值以使用accumarray? I'm totally confused by the documentation. 我完全被文档弄糊涂了。 Or is this actually the wrong approach? 还是这实际上是错误的方法? For what it's worth I'm using Matlab 2015b. 为了值得,我正在使用Matlab 2015b。

Honestly, any help would be greatly appreciated. 老实说,任何帮助将不胜感激。

ALTERNATE SOLUTION This hit me on the way home. 替代解决方案这使我在回家的路上感到震惊 Meshgrid is my friend... Meshgrid是我的朋友。

Once the cells above have been run (note I changed the size of the matrix X), we create the indices for the entire matrix where the "indices" for the pages (ie, times) are given by the values in "subs". 一旦运行了上面的单元格(请注意,我更改了矩阵X的大小),我们将为整个矩阵创建索引,其中页面(即时间)的“索引”由“ subs”中的值给出。 To do this, I use meshgrid. 为此,我使用meshgrid。

[days,rivers,pages] = meshgrid(1:C,1:R,subs);
grpvals = [rivers(:) days(:) pages(:)];
tst     = accumarray(grpvals,X(:),[R C newP],@sum);

Probably not the most memory efficient as I have to create essentially the days, rivers, and pages matrices, then wind up creating a new grpvals array of those. 可能不是最高效的内存,因为我必须本质上创建日期,河流和页面矩阵,然后最终创建一个包含这些矩阵的新grpvals数组。 But, it has the advantage that now I can use accumarray and pass anonymous functions, @std, etc. 但是,它的优势在于现在我可以使用accumarray并传递匿名函数,@ std等。

Hope this helps others! 希望这对别人有帮助!

Huge thanks to Luis. 非常感谢路易斯。

If all groups have the same size 如果所有组的大小都相同

You can do the aggregation as follows: 您可以按以下方式进行汇总:

  1. reshape along a 4th dimension to build the groups you want to aggregate. 沿第4维reshape以构建要聚合的组。 The 3rd dimension now refers to elements of each group, and the 4th dimension refers to groups. 现在,第3维是指每个组的元素,第4维是指组。
  2. sum along the 3rd dimension (each group). 沿第3维sum (每个组)。
  3. squeeze out the now-singleton 3rd dimension to recover a 3D array. squeeze现在单一的第3维以恢复3D阵列。

Code: 码:

X = randi(9,2,3,6); %// example data. 3D array.
G = 2; %// group size along 3rd dim. Divides size(X,3)
result = squeeze(sum(reshape(X, size(X,1), size(X,2), G, []), 3));

For example, with G = 2 , 例如,对于G = 2

X(:,:,1) =
     2     3     9
     4     5     9
X(:,:,2) =
     3     8     2
     6     9     8
X(:,:,3) =
     4     4     4
     1     1     7
X(:,:,4) =
     9     9     8
     2     4     1
X(:,:,5) =
     9     5     9
     3     5     8
X(:,:,6) =
     9     1     3
     5     3     1

gives

result(:,:,1) =
     5    11    11
    10    14    17
result(:,:,2) =
    13    13    12
     3     5     8
result(:,:,3) =
    18     6    12
     8     8     9

General case: groups with possibly different sizes 一般情况:大小可能不同的组

Since accumarray doesn't work with a multidimensional array (or even a matrix) as second input, you can use matrix multiplication along the lines of this answer . 由于accumarray不适用于多维数组(甚至是矩阵)作为第二个输入,因此可以沿此答案行使用矩阵乘法。 For that you need to pack the first two dimensions of your 3D array into one dimension (which will be unpacked at the end), and from the group indices build a zero-one matrix that will give the desired result through matrix multiplication. 为此,您需要将3D数组的前两个维打包为一个维(最后将解压缩),然后从组索引中构建一个零一矩阵,该矩阵将通过矩阵乘法给出所需的结果。

X = randi(9,2,3,5); %// example data. 3D array.
subs = [1 2 2 1 1]; %// indices of groups. Groups may differ in size, and indices
                    %// need not be sorted
Y = reshape(X, [], size(X,3)); %// reshape into a matrix. Groups are along rows
M = full(sparse(1:numel(subs), subs, 1)); %// indicator matrix from group indices
result = reshape(Y*M, size(X,1), size(X,2), []); %// compute result and reshape 

For example, 例如,

X(:,:,1) =
     9     3     8
     6     8     8
X(:,:,2) =
     3     8     3
     7     2     2
X(:,:,3) =
     7     3     6
     2     8     5
X(:,:,4) =
     7     4     5
     8     8     6
X(:,:,5) =
     2     3     2
     2     8     8

subs =
     1     2     2     1     1

gives

result(:,:,1) =
    18    10    15
    16    24    22
result(:,:,2) =
    10    11     9
     9    10     7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM