I am working on a project based on the OpenPose research paper that I read two weeks ago. In that, the model is supposed to give a 5-dimensional output. For example, torch.nn.conv2d()
gives a 4-D output of the following shape: (Batch_size, n_channels, input_width, input_height)
. What I need is an output of the following shape: (Batch_size, n_channels, input_width, input_height, 2)
. Here 2
is a fixed number not subject to any changes. The 2
is there because each entry is a 2-dimensional vector hence for each channel in every pixel position, there are 2 values hence, the added dimension.
What will be the best way to do this? I thought about having 2 seperate branches for each of the vector values but the network is very deep and I would like to be as Computationally efficient as possible.
So you are effectively looking to compute feature maps which are interpreted as 2-dimensional vectors. Unless there is something fancy math-wise happening there, you are probably fine with just having twice as many output channels: (batch_size, n_channels * 2, width, height)
, and then reshaping it as
output5d = output4d.reshape(
output4d.shape[0],
output4d.shape[1] / 2,
2,
output4d.shape[2],
output4d.shape[3]
)
which gives you a shape of (batch_size, n_channels, 2, width, height)
. If you really want to have 2
as the last dimension, you can use transpose
:
output5d = output5d.transpose(2, 4)
but if there is no strong argument in favor of this layout, I would suggest you do not transpose as it always costs a bit of performance.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.