简体   繁体   English

计算 Kernel Metal - 如何检索结果和调试?

[英]Compute Kernel Metal - How to retrieve results and debug?

I've downloaded apple's truedepth streamer example and am trying to add a compute pipeline.我已经下载了苹果的 truedepth 流媒体示例,并正在尝试添加计算管道。 I think I'm retrieving the results of the computation but am not sure as they all seem to be zero.我想我正在检索计算结果,但不确定,因为它们似乎都为零。

I'm a beginner at iOS development so there maybe quite a few mistakes so please bear with me!我是 iOS 开发的初学者,所以可能会有很多错误,所以请多多包涵!

The pipeline set up: (i wasn't quite sure how to create the resultsbuffer, since the kernel outputs a float3)管道设置:(我不太确定如何创建结果缓冲区,因为 kernel 输出一个 float3)

int resultsCount = CVPixelBufferGetWidth(depthFrame) * CVPixelBufferGetHeight(depthFrame);

//because I will be output 3 floats for each value in depthframe
id<MTLBuffer> resultsBuffer = [self.device newBufferWithLength:(sizeof(float) * 3 * resultsCount) options:MTLResourceOptionCPUCacheModeDefault];


_threadgroupSize = MTLSizeMake(16, 16, 1);

// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width  = (inTexture.width  + _threadgroupSize.width -  1) / _threadgroupSize.width;
_threadgroupCount.height = (inTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;

// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;

id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];

[computeEncoder setComputePipelineState:_computePipelineState];

[computeEncoder setTexture: inTexture atIndex:0];

[computeEncoder setBuffer:resultsBuffer offset:0 atIndex:1];

[computeEncoder setBytes:&intrinsics length:sizeof(intrinsics) atIndex:0];

[computeEncoder dispatchThreadgroups:_threadgroupCount
                       threadsPerThreadgroup:_threadgroupSize];

[computeEncoder endEncoding];


// Finalize rendering here & push the command buffer to the GPU
[commandBuffer commit];

//for testing
[commandBuffer waitUntilCompleted];

I have added the following compute kernel:我添加了以下计算 kernel:

kernel void
calc(texture2d<float, access::read>  inTexture  [[texture(0)]],
                device float3 *resultsBuffer [[buffer(1)]],
                constant float3x3& cameraIntrinsics [[ buffer(0) ]],
                uint2 gid [[thread_position_in_grid]])
{

    float val = inTexture.read(gid).x * 1000.0f;

    float xrw = (gid.x - cameraIntrinsics[2][0]) * val / cameraIntrinsics[0][0];
    float yrw = (gid.y - cameraIntrinsics[2][1]) * val / cameraIntrinsics[1][1];

    int vertex_id = ((gid.y * inTexture.get_width()) + gid.x);

    resultsBuffer[vertex_id] = float3(xrw, yrw, val);

}

Code for seeing buffer result: (I tried two different ways and both are outputting all zeroes at the moment)查看缓冲区结果的代码:(我尝试了两种不同的方法,目前都输出全零)

    void *output = [resultsBuffer contents];
    for (int i = 0; i < 10; ++i) {
        NSLog(@"value is %f", *(float *)(output) ); //= *(float *)(output + 4 * i);
    }

    NSData *data = [NSData dataWithBytesNoCopy:resultsBuffer.contents length:(sizeof(float) * 3 * resultsCount)freeWhenDone:NO];
    float *finalArray = new float [resultsCount * 3];
    [data getBytes:&finalArray[0] length:sizeof(finalArray)];
    for (int i = 0; i < 10; ++i) {
        NSLog(@"here is output %f", finalArray[i]);
    }

I see a couple of problems here, but neither of them are related to your Metal code per se.我在这里看到了几个问题,但它们都与您的 Metal 代码本身无关。

In your first output loop, as written, you're just printing the first element of the results buffer 10 times.在您的第一个 output 循环中,如所写,您只需将结果缓冲区的第一个元素打印 10 次。 The first element may legitimately be 0, leading you to believe all of the results are zero.第一个元素可能合法地为 0,导致您相信所有结果都为零。 But when I changed the first log line to但是当我将第一条日志行更改为

NSLog(@"value is %f", ((float *)output)[i]);

I saw different values printed when running your kernel on a test image.在测试图像上运行 kernel 时,我看到打印了不同的值。

The other issue is related to your getBytes:length: call.另一个问题与您的getBytes:length:调用有关。 You want to pass the number of bytes to copy, but sizeof(finalArray) is actually the size of the finalArray pointer , ie, 4 bytes, not the total size of the buffer it points to.您想传递要复制的字节数,但sizeof(finalArray)实际上finalArray指针的大小,即 4 个字节,而不是它指向的缓冲区的总大小。 This is an extremely common error in C and C++ code.这是 C 和 C++ 代码中极为常见的错误。

Instead, you can use the same byte count as the one you used when allocating space:相反,您可以使用与分配空间时使用的字节数相同的字节数:

[data getBytes:&finalArray[0] length:(sizeof(float) * 3 * resultsCount)];

You should then find that you get the same (non-zero) values printed as in the previous step.然后,您应该会发现打印的值与上一步相同(非零)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM