简体   繁体   中英

Where does the third dimension (as in 4x4x4) of tensor cores come from?

As I understand, the Nvidia tensor cores multiplies two 4x4 matrices and adds the result to a third matrix. Multiplying two 4x4 matrices produces a 4x4 matrix, and adding two 4x4 matrices produces a 4x4 matrix. Still "Each Tensor Core provides a 4x4x4 matrix processing array".

There are 4x multiplication-accumulate operations that are needed for each row*col. I thought maybe the last x4 comes from intermediate result before the accumulation, but I don't think it quite fits with the description on Nvidias pages.

"The FP16 multiply results in a full precision result that is accumulated in FP32 operations with the other products in a given dot product for a 4x4x4 matrix multiply, as Figure 9 shows." https://developer.nvidia.com/blog/cuda-9-features-revealed/

4x4x4 matrix multiply? I thought matrices was 2dimensions by definition.

Can someone please explain where the last x4 comes from?

立方体本身代表了生成完整的 4x4 乘积矩阵所需的 64 个元素乘积”cvw.cac.cornell.edu/GPUarch/tensor_cores。构成最后一个 x4 的是累积之前的中间乘积。

4x4x4 is just the notation for multiplication of one 4x4 matrix with another 4x4 matrix.

If you were to multiply a 4x8 matrix with a 8x4 matrix, you would have 4x8x4. So if A is NxK and B is KxM, then it can be referred to as a NxKxM matrix multiply.

I just briefly looked up and found this paper, where they use this exact notation (eg in Section 4.6 on page 36): https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/153863/eth-6705-01.pdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM