简体   繁体   中英

Metal IOS simple passthrough compute kernel takes 10 miliseconds on iphone 5s

I created simple passthrough compute kernel

kernel void filter(texture2d<float, access::read> inTexture [[texture(0)]],
                         texture2d<float, access::write> outTexture [[texture(1)]],
                         uint2 gridPos [[ thread_position_in_grid ]]) {
  float4 color = inTexture.read(gridPos);
  outTexture.write(color, gridPos);
}

Measuring the execution time

[self.timer start];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
CGFloat ms = [self.timer elapse];

Timer class works like this:

- (void)start {
  self.startMach = mach_absolute_time();
}  

- (CGFloat)elapse {
  uint64_t end = mach_absolute_time();
  uint64_t elapsed = end - self.startMach;
  uint64_t nanosecs = elapsed * self.info.numer / self.info.denom;
  uint64_t millisecs = nanosecs / 1000000;

  return millisecs;
}

Dispatch call:

static const NSUInteger kGroupSize = 16;
- (MTLSize)threadGroupSize {
  return MTLSizeMake(kGroupSize, kGroupSize, 1);
}

- (MTLSize)threadGroupsCount:(MTLSize)threadGroupSize {
  return MTLSizeMake(self.provider.texture.width / kGroupSize,
                 self.provider.texture.height / kGroupSize, 1);
}

[commandEncoder dispatchThreadgroups:threadgroups 
               threadsPerThreadgroup:threadgroupSize];

gives me 13 ms on 512x512 rgba image and it grows lineary if I perform more passes.

Is this correct? It seems too much overhead for real time application.

Compute kernels are known to have rather high overhead on A7 processors. One thing to consider, though, is that this is basically the least flattering test you can run: a one-shot threadgroup dispatch might take ~2ms to get scheduled, but scheduling of subsequent dispatches can be up to an order of magnitude faster. Additionally there's little chance for latency hiding here. In practice, a much more complex kernel probably wouldn't take substantially longer to execute, and if you can interleave it with whatever rendering you might be doing, you might find performance to be acceptable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM