简体   繁体   中英

Matlab `unstack`: Safe to assume ordering of new columns?

According to the documentation , Matlab's unstack can take this table:

S=12×3 table
Storm    Town    Snowfall
_____    ____    ________

  3      'T1'        0   
  3      'T3'        3   
  1      'T1'        5   
  3      'T2'        5   
  1      'T2'        9   
  1      'T3'       10   
  4      'T2'       12   
  2      'T1'       13   
  4      'T3'       15   
  2      'T3'       16   
  4      'T1'       17   
  2      'T2'       21   

...and convert it into:

U = unstack(S,'Snowfall','Town')

U=4×4 table
Storm    T1    T2    T3
_____    __    __    __

  3       0     5     3
  1       5     9    10
  4      17    12    15
  2      13    21    16

It seems reasonable to assume that the new columns are generated in alphabetic order. To assume this would be fine if one is manually manipulating data, but is a deal breaker for automated data processing if one cannot be 100% assured of the ordering of the columns. For example, if the Town column was actually a numerical index, then the new column names would be automatically generated so as to be legitimate variable names, and the ordering would be the key piece of information linking the new columns back to the values in the Town field. If one extracts U{:,2:end} for manipulation, the data could be all wrong unless one could be 100% sure of whatever the scheme is for ordering the new columns.

I actually create a new column in place of Town containing a valid string, suffixed with the numerical index value. These become the new column headings. But the reality is, having to write extra code to assure that the columns appear in the right order is too much trouble. It cancels out the benefit of unstack, and I ended up just creating loops to build up the new columns one by one. Not efficient or elegant in terms of time and code. I am trying to find a way to reliably exploit unstack in the future.

I have already submitted feedback describing the criticality of this bit of information, but I don't expect a response back any time soon. Meanwhile, unstacking is such a useful function that I wonder whether anyone can weigh in about the advisability of assuming alphabetic ordering of the new columns?

Yes, from what I understood in the code source of unstack.m (you can read it by typing edit unstack ), the columns will be in alphabetical order following Unicode alphabetical order by using a function that converts the identifier to a unique index, before checking if the identifier is valid .

The Unicode order will mean in particular:

  • that T10 will be before T9 .
  • t10 will be after T10 .

According to unstack , the function that converts the identifier to a unique index subs2inds relies on a class tabularDimension which is said to be (at R2018b) temporal:

%tabularDimension Internal abstract class to represent a tabular's dimension.

% This class is for internal use only and will change in a
% future release.  Do not use this class.

After sorting the identifiers, comes the validity checking with the function matlab.lang.makeValidName (using the default option 'Prefix','x' ) that will modify the identifier if is not valid (replacing illegal character by underscore by default).

A valid MATLAB identifier is a character vector of alphanumerics (A–Z, a–z, 0–9) and underscores, such that the first character is a letter and the length of the character vector is less than or equal to namelengthmax .

makeValidName deletes any whitespace characters before replacing any characters that are not alphanumerics or underscores. If a whitespace character is followed by a lowercase letter, makeValidName converts the letter to the corresponding uppercase character.

For example:

  • 2A will be change to x2A .
  • ça will be change to x_A .

Particular case will be dealt with the help of the matlab.lang.makeuniquestrings function.

For example, if you ask identifiers: ç1 , à1 , Matlab will still be able to distinguish them and rename them respectively x_1_1 , x_1 .


In your case, I will suggest to automatically generate columns with a constant starting letter, then the index with leading zeros resulting in a constant number of characters: T0001 , T0002 , ..., T0100 , ..., T9999 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM