简体   繁体   中英

Performance cost of float ↔︎ half conversion in Metal

I have a Metal-based Core Image convolution kernel that was using half precision variables for keeping track of sums and weights. However, I now figured that the range of 16-bit half is not enough in some cases, which means I need 32-bit float for some variables.

Now I'm wondering what's more performant:

  • use half as much as possible (for the samplers and most local vars) and only convert to float when needed (which means quite a lot, inside the loop)
  • or change all samplers and local vars to float type so that no conversion is necessary.

The former would mean that all arithmetic is performed in 32-bit precision, though it would only be needed for some operations.

Is there any documentation or benchmark I can run to find the cost of float ↔︎ half conversion in Metal?

I believe you should go with option A:

use half as much as possible (for the samplers and most local vars) and only convert to float when needed (which means quite a lot, inside the loop)

based on the discussion in the WWDC 2016 talk entitled "Advanced Metal Shader Optimization" linked here .

Between around 17:17-18:58 is the relevant section for this topic. The speaker Fiona mentions a couple of things of importance:

  1. A8 and later GPUs have 16-bit registers, which means that 32-bit floating-point formats (like float ) use twice as many registers, which means twice as much bandwidth, energy, etc. So using half saves registers (which is always good) and energy
  2. On A8 and later GPUs, "data type conversions are typically free , even between float and half [emphasis added]." Fiona even poses questions you might be asking yourself covering what I believe you are concerned about with all of the conversions and says that it's still probably fast because the conversions are free. Furthermore, according to the Metal Shading Language Specification Version 2.3 (pg. 218)

For textures that have half-precision floating-point pixel color values, the conversions from half to float are lossless

so that you don't have to worry about precision being lost as well.

There are some other relevant points that are worth looking into as well in that section, but I believe this is enough to justify going with option A

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM