在什么条件下Metal着色器代码“崩溃？”

Question

I'm developing a Metal-based app, and in some cases properly compiled and linked shader code will cause the application to simply crash without throwing any errors. 我正在开发一个基于金属的应用程序，在某些情况下，正确编译和链接的着色器代码将导致应用程序简单崩溃而不会抛出任何错误。

A "crash" consists of a halt in visual output (in some cases preceded by a short stutter of a couple alternating frames), but otherwise normal procession of the rest of the application. “崩溃”包括视觉输出的停顿（在某些情况下，前面是一对交替帧的短断续续续），但是否则是应用程序其余部分的正常处理。 The Xcode performance monitoring utilities report 60fps but 0ms GPU latency, and CPU-side execution continues, with calls to the Metal API still completing successfully. Xcode性能监视实用程序报告60fps但GPU延迟为0ms，并且CPU端执行仍在继续，对Metal API的调用仍然成功完成。

No errors are reported to the console. 没有错误报告给控制台。

This is extremely difficult to debug, as I have no indication of where in shader code the error is coming from. 这是非常难以调试的，因为我没有指出错误来自着色器代码的位置。 It would help if I knew under what conditions this is actually supposed to happen, so that I can have a good list of things to check. 如果我知道在什么条件下实际应该发生这种情况会有所帮助，这样我就可以有一个很好的清单来检查。 Otherwise I'm just shooting in the dark whenever this comes up. 否则我只是在黑暗中拍摄。

Answer 1

The GPU can crash when you read or write off the end of a MTLBuffer, write off the end of a MTLTexture, or simply run too long. 当您读取或注销MTLBuffer的末尾，注销MTLTexture的末尾或者只是运行太长时，GPU可能会崩溃。 There is a watchdog timer that will reset the GPU if it doesn't complete its work in less than a few seconds. 有一个看门狗定时器，如果它在几秒钟内没有完成其工作，它将重置GPU。 Work on the GPU is not preemptively scheduled. GPU上的工作不是预先安排的。 It is possible for long running work to make the device seem locked up by preventing basic GUI tasks from executing. 通过阻止执行基本GUI任务，长时间运行的工作可能会使设备看起来被锁定。 If you have long running workloads, it is necessary to split it up into many smaller kernels. 如果您有长时间运行的工作负载，则需要将其拆分为许多较小的内核。 To keep the interface responsive you should keep workloads < 100 ms. 为了保持界面响应，您应该保持工作负载<100毫秒。 To avoid video stuttering, a consistent frame rate is recommended. 为避免视频卡顿，建议使用一致的帧速率。

Answer 2

I was having frequent crashes due to heavy Metal shaders as well and manged to fix it by throttling the dispatch rate. 由于重金属着色器，我经常发生崩溃，并通过限制调度率来修复它。 You can do this easily by measuring the runtime of the last "frame", and inserting a wait before every dispatch by a ratio of that amount: 您可以通过测量最后一个“帧”的运行时间来轻松完成此操作，并在每次调度之前插入一个等待的比例：

[NSthread sleepFortimeInterval: _lastRunTime*RATIO];
NSDate *startTime = [NSDate date];
... [use Metal shaders] ...
_lastRunTime = -[startTime timeIntervalSinceNow];

I set the RATIO to 1.0. 我将RATIO设置为1.0。 So it never uses more than 50% of gpu. 所以它永远不会使用超过50％的gpu。 It obviously impacts frame rate, but beats random crashes. 它显然会影响帧速率，但会击败随机崩溃。 You can play with the ratio. 你可以玩这个比例。 Nice thing is you don't have to worry about throttling too much or too little on different products, as its a ratio of runtime. 好的一点是你不必担心在不同的产品上节流太多或太少，因为它的运行时间比例。

在什么条件下Metal着色器代码“崩溃？”

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-06-18 16:17:13

解决方案2
0 2018-01-18 01:54:39

在什么条件下Metal着色器代码“崩溃？”

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-06-18 16:17:13

解决方案2 0 2018-01-18 01:54:39

解决方案1
2 已采纳 2016-06-18 16:17:13

解决方案2
0 2018-01-18 01:54:39