計算 Kernel Metal - 如何檢索結果和調試？

Question

我已經下載了蘋果的 truedepth 流媒體示例，並正在嘗試添加計算管道。 我想我正在檢索計算結果，但不確定，因為它們似乎都為零。

我是 iOS 開發的初學者，所以可能會有很多錯誤，所以請多多包涵！

管道設置：（我不太確定如何創建結果緩沖區，因為 kernel 輸出一個 float3）

int resultsCount = CVPixelBufferGetWidth(depthFrame) * CVPixelBufferGetHeight(depthFrame);

//because I will be output 3 floats for each value in depthframe
id<MTLBuffer> resultsBuffer = [self.device newBufferWithLength:(sizeof(float) * 3 * resultsCount) options:MTLResourceOptionCPUCacheModeDefault];


_threadgroupSize = MTLSizeMake(16, 16, 1);

// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width  = (inTexture.width  + _threadgroupSize.width -  1) / _threadgroupSize.width;
_threadgroupCount.height = (inTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;

// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;

id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];

[computeEncoder setComputePipelineState:_computePipelineState];

[computeEncoder setTexture: inTexture atIndex:0];

[computeEncoder setBuffer:resultsBuffer offset:0 atIndex:1];

[computeEncoder setBytes:&intrinsics length:sizeof(intrinsics) atIndex:0];

[computeEncoder dispatchThreadgroups:_threadgroupCount
                       threadsPerThreadgroup:_threadgroupSize];

[computeEncoder endEncoding];


// Finalize rendering here & push the command buffer to the GPU
[commandBuffer commit];

//for testing
[commandBuffer waitUntilCompleted];

我添加了以下計算 kernel：

kernel void
calc(texture2d<float, access::read>  inTexture  [[texture(0)]],
                device float3 *resultsBuffer [[buffer(1)]],
                constant float3x3& cameraIntrinsics [[ buffer(0) ]],
                uint2 gid [[thread_position_in_grid]])
{

    float val = inTexture.read(gid).x * 1000.0f;

    float xrw = (gid.x - cameraIntrinsics[2][0]) * val / cameraIntrinsics[0][0];
    float yrw = (gid.y - cameraIntrinsics[2][1]) * val / cameraIntrinsics[1][1];

    int vertex_id = ((gid.y * inTexture.get_width()) + gid.x);

    resultsBuffer[vertex_id] = float3(xrw, yrw, val);

}

查看緩沖區結果的代碼：（我嘗試了兩種不同的方法，目前都輸出全零）

    void *output = [resultsBuffer contents];
    for (int i = 0; i < 10; ++i) {
        NSLog(@"value is %f", *(float *)(output) ); //= *(float *)(output + 4 * i);
    }

    NSData *data = [NSData dataWithBytesNoCopy:resultsBuffer.contents length:(sizeof(float) * 3 * resultsCount)freeWhenDone:NO];
    float *finalArray = new float [resultsCount * 3];
    [data getBytes:&finalArray[0] length:sizeof(finalArray)];
    for (int i = 0; i < 10; ++i) {
        NSLog(@"here is output %f", finalArray[i]);
    }

Answer 1

我在這里看到了幾個問題，但它們都與您的 Metal 代碼本身無關。

在您的第一個 output 循環中，如所寫，您只需將結果緩沖區的第一個元素打印 10 次。 第一個元素可能合法地為 0，導致您相信所有結果都為零。 但是當我將第一條日志行更改為

NSLog(@"value is %f", ((float *)output)[i]);

在測試圖像上運行 kernel 時，我看到打印了不同的值。

另一個問題與您的getBytes:length:調用有關。 您想傳遞要復制的字節數，但sizeof(finalArray)實際上是finalArray指針的大小，即 4 個字節，而不是它指向的緩沖區的總大小。 這是 C 和 C++ 代碼中極為常見的錯誤。

相反，您可以使用與分配空間時使用的字節數相同的字節數：

[data getBytes:&finalArray[0] length:(sizeof(float) * 3 * resultsCount)];

然后，您應該會發現打印的值與上一步相同（非零）。

計算 Kernel Metal - 如何檢索結果和調試？

問題描述

1 個解決方案

解決方案1
2 已采納 2019-10-04 22:46:29

計算 Kernel Metal - 如何檢索結果和調試？

問題描述

1 個解決方案

解決方案1 2 已采納 2019-10-04 22:46:29

解決方案1
2 已采納 2019-10-04 22:46:29