简体   繁体   中英

Matlab `rowfun` function with multiple outputs: Safe to assume row order?

I tried providing a function to rowfun that returns multiple-row output, of the same height as the input. It seems to work as expected.

% Example table with 2-column-array as a single data field
x = table( [1;1;2;2] , [[2;2;1;1] [2;1;2;1]] , ...
           'VariableNames' , {'idx' 'Field2columns'} )

  x = idx    Field2columns
      ___    _____________
      1      2    2       
      1      2    1       
      2      1    2       
      2      1    1       

% Example anonymous function takes all rows with same idx value and
% reverse their row order
y = rowfun( @(z) z(end:-1:1,:) , x , 'Input','Field2columns' , ...
            'Grouping','idx' , 'OutputVar','OutVar' )

  y =        idx    GroupCount    OutVar
             ___    __________    ______
      1      1      2             2    1
      1_1    1      2             2    2
      2      2      2             1    1
      2_1    2      2             1    2

% Append the generated data to original table
[ x y(:,{'OutVar'}) ]

  ans =      idx    Field2columns    OutVar
             ___    _____________    ______
      1      1      2    2           2    1
      1_1    1      2    1           2    2
      2      2      1    2           1    1
      2_1    2      1    1           1    2

This makes for very efficient code. I would otherwise have to loop through all distinct values of x.idx , extract matching rows of x for each value, generate row-reversed subset and compile the results.

My only concern is that I am assuming that the row order of the output from the anonymous function will be maintained, and that each row will align with the corresponding row in table x . For example, if idx=7, then the Nth row in x for which idx=7 will be appended on the right with Nth row in anonymous function output when it is applied to x(x.idx==7,:) .

The rowfun documentation doesn't deal with cases in which the first argument represents a function that returns a multi-row output. I have only the observed behaviour to rely on. Would it be advisable to exploit this behaviour to streamline my code, or is it bad practice to rely on such undocumented behaviour, eg, corner cases may not be covered, and there is no obligation for TMW to maintain current behaviour in the future?

The documentation for rowfun , under 'GroupingVariables' says:

The output, B, contains one row for each group.

So if you get more than one row per group, you are definitely treading in undocumented waters. A future version could throw an error with your code.

Regarding the order of the input rows to your function: I would suggest you ask MathWorks about the order of the rows with the same grouping variables. One way would be to go to the bottom of the documentation page, select a star rating, then in the text box say that the documentation isn't complete because it doesn't specify the order of the rows when this option is given. The documentation folk like the docs being thorough and complete, they might answer this question by completing the documentation.

If you want to stay in the documented zone, you can use the very handysplitapply for that. To deal with the multiple rows in the output you can put them in a cell, and then convert it to a table:

y = splitapply(@(z) {z(end:-1:1,:)},x.Field2columns,x.idx) % note the {...} in the function
[x table(cell2mat(y),'VariableNames',{'OutVar'})] % this is like: [x y(:,{'OutVar'})]

I guess this will be less efficient, but it keeps your code within the documented behaviour of the functions, without a need to use loops.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM