Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs

Question

I'm working on macOS project that uses Swift and Metal for image processing on the GPU. Last week, I received my new 15-inch MacBook Pro (late 2016) and noticed something strange with my code: kernels that were supposed to write to a texture did not seem to do so...

After a lot of digging, I found that the problem is related to which GPU is used by Metal (AMD Radeon Pro 455 or Intel(R) HD Graphics 530) to do the computation.

Initializing the MTLDevice using MTLCopyAllDevices() returns an array of devices representing the Radeon and the Intel GPUs (while MTLCreateSystemDefaultDevice() returns the default device which is the Radeon). In any case, the code works as expected with the Intel GPU but that is not the case with the Radeon GPU.

Let me show you an example.

To start, here is a simple kernel that takes an input texture and copies its colour to an output texture:

    kernel void passthrough(texture2d<uint, access::read> inTexture [[texture(0)]],
                            texture2d<uint, access::write> outTexture [[texture(1)]],
                            uint2 gid [[thread_position_in_grid]])
    {
        uint4 out = inTexture.read(gid);
        outTexture.write(out, gid);
    }

I order to use this kernel, I use this piece of code:

    let devices = MTLCopyAllDevices()
    for device in devices {
        print(device.name!) // [0] -> "AMD Radeon Pro 455", [1] -> "Intel(R) HD Graphics 530"
    }

    let device = devices[0] 
    let library = device.newDefaultLibrary()
    let commandQueue = device.makeCommandQueue()

    let passthroughKernelFunction = library!.makeFunction(name: "passthrough")

    let cps = try! device.makeComputePipelineState(function: passthroughKernelFunction!)

    let commandBuffer = commandQueue.makeCommandBuffer()
    let commandEncoder = commandBuffer.makeComputeCommandEncoder()

    commandEncoder.setComputePipelineState(cps)

    // Texture setup
    let width = 16
    let height = 16
    let byteCount = height*width*4
    let bytesPerRow = width*4
    let region = MTLRegionMake2D(0, 0, width, height)
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Uint, width: width, height: height, mipmapped: false)

    // inTexture
    var inData = [UInt8](repeating: 255, count: Int(byteCount))
    let inTexture = device.makeTexture(descriptor: textureDescriptor)
    inTexture.replace(region: region, mipmapLevel: 0, withBytes: &inData, bytesPerRow: bytesPerRow)

    // outTexture
    var outData = [UInt8](repeating: 128, count: Int(byteCount))
    let outTexture = device.makeTexture(descriptor: textureDescriptor)
    outTexture.replace(region: region, mipmapLevel: 0, withBytes: &outData, bytesPerRow: bytesPerRow)

    commandEncoder.setTexture(inTexture, at: 0)
    commandEncoder.setTexture(outTexture, at: 1)
    commandEncoder.dispatchThreadgroups(MTLSize(width: 1,height: 1,depth: 1), threadsPerThreadgroup: MTLSize(width: width, height: height, depth: 1))

    commandEncoder.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()

    // Get the data back from the GPU
    outTexture.getBytes(&outData, bytesPerRow: bytesPerRow, from: region , mipmapLevel: 0)

    // Validation
    // outData should be exactly the same as inData 
    for (i,outElement) in outData.enumerated() {
        if outElement != inData[i] {
            print("Dest: \(outElement) != Src: \(inData[i]) at \(i))")
        }
    }

When running this code with let device = devices[0] (Radeon GPU), outTexture is never written to (my supposition) and as a result outData stays unchanged. On the other hand, when running this code with let device = devices[1] (Intel GPU), everything works as expected and outData is updated with the values in inData.

Answer 1

I think that whenever the GPU writes to a MTLStorageModeManaged resource such as a texture and you then want to read that resource from the CPU (eg using getBytes() ), you need to synchronize it using a blit encoder. Try putting the following above the commandBuffer.commit() line:

let blitEncoder = commandBuffer.makeBlitCommandEncoder()
blitEncoder.synchronize(outTexture)
blitEncoder.endEncoding()

You may get away without this on an integrated GPU because the GPU is using system memory for the resource and there's nothing to synchronize.

Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs

Question

1 answers

solution1
8 ACCPTED 2016-11-25 04:08:46

Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs

Question

1 answers

solution1 8 ACCPTED 2016-11-25 04:08:46

solution1
8 ACCPTED 2016-11-25 04:08:46