简体繁体中英

Performance cost of float ↔︎ half conversion in Metal

原文 2021-03-10 08:34:03 2 1 ios/ metal/ core-image

I have a Metal-based Core Image convolution kernel that was using half precision variables for keeping track of sums and weights. However, I now figured that the range of 16-bit half is not enough in some cases, which means I need 32-bit float for some variables.

Now I'm wondering what's more performant:

use half as much as possible (for the samplers and most local vars) and only convert to float when needed (which means quite a lot, inside the loop)
or change all samplers and local vars to float type so that no conversion is necessary.

The former would mean that all arithmetic is performed in 32-bit precision, though it would only be needed for some operations.

Is there any documentation or benchmark I can run to find the cost of float ↔︎ half conversion in Metal?

1 answers

I believe you should go with option A:

use half as much as possible (for the samplers and most local vars) and only convert to float when needed (which means quite a lot, inside the loop)

based on the discussion in the WWDC 2016 talk entitled "Advanced Metal Shader Optimization" linked here .

Between around 17:17-18:58 is the relevant section for this topic. The speaker Fiona mentions a couple of things of importance:

A8 and later GPUs have 16-bit registers, which means that 32-bit floating-point formats (like float ) use twice as many registers, which means twice as much bandwidth, energy, etc. So using half saves registers (which is always good) and energy
On A8 and later GPUs, "data type conversions are typically free , even between float and half [emphasis added]." Fiona even poses questions you might be asking yourself covering what I believe you are concerned about with all of the conversions and says that it's still probably fast because the conversions are free. Furthermore, according to the Metal Shading Language Specification Version 2.3 (pg. 218)

For textures that have half-precision floating-point pixel color values, the conversions from half to float are lossless

so that you don't have to worry about precision being lost as well.

There are some other relevant points that are worth looking into as well in that section, but I believe this is enough to justify going with option A

iOS Metal: casting half4 variable to float4 type

Filling Float buffer in Metal

Rendering Quads Performance with Metal

Metal performance debugging

Metal vs GLSL CoreImage performance

Performance of metal function multiple call

Metal atomic operations performance on iOS

swift typecasting performance cost

Float precision problem in metal shading language

Metal: unknown type name float4

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question iOS Metal: casting half4 variable to float4 type Filling Float buffer in Metal Rendering Quads Performance with Metal Metal performance debugging Metal vs GLSL CoreImage performance Performance of metal function multiple call Metal atomic operations performance on iOS swift typecasting performance cost Float precision problem in metal shading language Metal: unknown type name float4

Related Tags

Performance cost of float ↔︎ half conversion in Metal

Question

1 answers

solution1 1 ACCPTED 2021-04-20 00:29:42

solution1
1 ACCPTED 2021-04-20 00:29:42