How to transpose a 4D tensor in C++?

Question

I need to pre-process the input of an ML model into the correct shape. In order to do that, I need to transpose a tensor from ncnn in C++. The API does not offer a transpose , so I am trying to implement my own transpose function.

The input tensor has the shape (1, 640, 640, 3) (for batch , x , y and color ) and I need to reshape it to the shape (1, 3, 640, 640) .

How do I properly and efficiently transpose the tensor?

ncnn:Mat& preprocess(const cv::Mat& rgba) {
    int width = rgba.cols;
    int height = rgba.rows;

    // Build a tensor from the image input
    ncnn::Mat in = ncnn::Mat::from_pixels(rgba.data, ncnn::Mat::PIXEL_RGBA2RGB, width, height);

    // Set the current shape of the tesnor 
    in = in.reshape(1, 640, 640, 3);

    // Normalize
    const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
    in.substract_mean_normalize(0, norm_vals);

    // Prepare the transposed matrix
    ncnn::Mat transposed = new ncnn::Mat(in.w, in.c, in.h, in.d, sizeof(float));
    ncnn::Mat shape = transposed->shape();

    // Transpose
    
    for (int i = 0; i < in.w; i++) {
        for (int j = 0; j < in.h; j++) {
            for (int k = 0; k < in.d; k++) {
                for (int l = 0; l > in.c; l++) {
                    int fromIndex = ???;
                    int toIndex = ???;
                    transposed[toIndex] = in[fromIndex];
                }
            }
        }
    }

    return transposed; 
}

Answer 1

I'm only talking about index calculations, not the ncnn API which I'm not familiar with.

You set

fromIndex = i*A + j*B + k*C + l*D;
  toIndex = i*E + j*F + k*G + l*H;

where you compute ABCDEFGH based on the source and target layout. How?

Let's look at a simple 2D transposition first. Transpose a hw layout matrix to a wh layout matrix (slowest changing dimension first):

  for (int i = 0; i < h; ++i) {
      for (int j = 0; j < w; ++j) {
          int fromIndex = i * w + j * 1;
          //              ^       ^
          //              |       |
          //             i<h     j<w        <---- hw layout

          int   toIndex = j * h + i * 1;
          //              ^       ^
          //              |       |
          //             j<w     i<h        <---- wh layout
      }      
  }

So when computing fromIndex , you start with the source layout (hw), you remove the first letter (h) and what remains (w) is your coefficient that goes with i, and you remove the next letter (w) and what remains (1) is your coefficient that goes with j. It is not hard to see that the same kind of pattern works in any number of dimensions. For example, if your source layout is dchw, then you have

fromIndex = i * (c*h*w) + j * (h*w) + k * (w) + l * (1);
//          ^             ^           ^         ^
//          |             |           |         |
//         i<d           j<c         k<h       l<w   <---- dchw

What about toIndex ? Same thing but rearrange the letters from the slowest-changing to the fastest-changing in the target layout . For example, if your target layout is hwcd, then the order will be klji (because i is the index that ranges over [0..d), in both source and target layouts, etc). So

  toIndex = k * (w*c*d) + l * (c*d) + j * (d) + i * (1);
  //        ^             ^           ^         ^
  //        |             |           |         |
  //       k<h           l<w         j<c       i<d   <---- hwcd

I did not use your layouts on purpose. Do your own calculations a couple of times. You want to develop some intuition about this thing.

How to transpose a 4D tensor in C++?

Question

1 answers

solution1
1 ACCPTED 2022-06-08 21:38:00

How to transpose a 4D tensor in C++?

Question

1 answers

solution1 1 ACCPTED 2022-06-08 21:38:00

solution1
1 ACCPTED 2022-06-08 21:38:00