简体   繁体   English

posenet 返回什么?

[英]What does posenet return?

I'm working on a project which read an image as an input and show and output image.我正在研究一个将图像作为输入读取并显示和输出图像的项目。 The output image contains some lines to indicate the human body skeleton.输出图像包含一些线条来表示人体骨骼。 I'm using pose estimation model from tensorflow-lite:我正在使用 tensorflow-lite 的姿势估计模型:

https://www.tensorflow.org/lite/models/pose_estimation/overview https://www.tensorflow.org/lite/models/pose_estimation/overview

I have read the docs, and it shows that the output contains a 4-dimensions array.我已经阅读了文档,它显示输出包含一个 4 维数组。 I have tried to use netron to visualize my model file and it looks like this:我尝试使用 netron 来可视化我的模型文件,它看起来像这样: 模型可视化

I succeeded to get the result heatmap from the input but I got a problem that all the float are negative.我成功地从输入中获得了结果热图,但我遇到了一个问题,即所有的浮点数都是负数。 It makes me confused and I'm not sure if I did anything wrong or how to understand these outputs.这让我感到困惑,我不确定我是否做错了什么或如何理解这些输出。

Here's the code for the output这是输出的代码

            tfLite = new Interpreter(loadModelFile());
            Bitmap inputPhoto = BitmapFactory.decodeResource(getResources(), R.drawable.human2);
            inputPhoto = Bitmap.createScaledBitmap(inputPhoto, INPUT_SIZE_X, INPUT_SIZE_Y, false);
            inputPhoto = inputPhoto.copy(Bitmap.Config.ARGB_8888, true);

            int pixels[] = new int[INPUT_SIZE_X * INPUT_SIZE_Y];

            inputPhoto.getPixels(pixels, 0, INPUT_SIZE_X, 0, 0, INPUT_SIZE_X, INPUT_SIZE_Y);

            int pixelsIndex = 0;

            for (int i = 0; i < INPUT_SIZE_X; i ++) {
                for (int j = 0; j < INPUT_SIZE_Y; j++) {
                    int p = pixels[pixelsIndex];
                    inputData[0][i][j][0] = (p >> 16) & 0xff;
                    inputData[0][i][j][1] = (p >> 8) & 0xff;
                    inputData[0][i][j][2] = (p) & 0xff;
                    pixelsIndex ++;
                }
            }

            float outputData[][][][] = new float[1][23][17][17];

            tfLite.run(inputData, outputData);

The output is an array [1][23][17][17] which is all negative.输出是一个数组 [1][23][17][17],它都是负数。 So is there anyone who known about this can help me :(那么有没有知道这件事的人可以帮助我:(

Thanks a lot !非常感谢 !

This post came Active today so I post a late answer, sorry about that.这篇文章今天很活跃,所以我发布了一个迟到的答案,对此感到抱歉。
You should check the Posenet.kt file .您应该检查Posenet.kt 文件 There you can see a very detailed documented code.在那里您可以看到非常详细的文档化代码。 You can see how this:你可以看到这是怎么回事:

Initializes an outputMap of 1 * x * y * z FloatArrays for the model processing to populate.初始化 1 * x * y * z FloatArrays 的 outputMap 以供模型处理填充。 */ */

private fun initOutputMap(interpreter: Interpreter): HashMap<Int, Any> {
    val outputMap = HashMap<Int, Any>()

    // 1 * 9 * 9 * 17 contains heatmaps
    val heatmapsShape = interpreter.getOutputTensor(0).shape()
    outputMap[0] = Array(heatmapsShape[0]) {
      Array(heatmapsShape[1]) {
        Array(heatmapsShape[2]) { FloatArray(heatmapsShape[3]) }
      }
    }

    // 1 * 9 * 9 * 34 contains offsets
    val offsetsShape = interpreter.getOutputTensor(1).shape()
    outputMap[1] = Array(offsetsShape[0]) {
      Array(offsetsShape[1]) { Array(offsetsShape[2]) { FloatArray(offsetsShape[3]) } }
    }

    // 1 * 9 * 9 * 32 contains forward displacements
    val displacementsFwdShape = interpreter.getOutputTensor(2).shape()
    outputMap[2] = Array(offsetsShape[0]) {
      Array(displacementsFwdShape[1]) {
        Array(displacementsFwdShape[2]) { FloatArray(displacementsFwdShape[3]) }
      }
    }

    // 1 * 9 * 9 * 32 contains backward displacements
    val displacementsBwdShape = interpreter.getOutputTensor(3).shape()
    outputMap[3] = Array(displacementsBwdShape[0]) {
      Array(displacementsBwdShape[1]) {
        Array(displacementsBwdShape[2]) { FloatArray(displacementsBwdShape[3]) }
      }
    }

    return outputMap
    }

and of course how the output is transformed to points on screen:当然,输出如何转换为屏幕上的点:

/**
 * Estimates the pose for a single person.
 * args:
 *      bitmap: image bitmap of frame that should be processed
 * returns:
 *      person: a Person object containing data about keypoint locations and confidence scores
 */
fun estimateSinglePose(bitmap: Bitmap): Person {
    val estimationStartTimeNanos = SystemClock.elapsedRealtimeNanos()
    val inputArray = arrayOf(initInputArray(bitmap))
    Log.i(
        "posenet",
        String.format(
            "Scaling to [-1,1] took %.2f ms",
            1.0f * (SystemClock.elapsedRealtimeNanos() - estimationStartTimeNanos) / 1_000_000
        )
    )

    val outputMap = initOutputMap(getInterpreter())

    val inferenceStartTimeNanos = SystemClock.elapsedRealtimeNanos()
    getInterpreter().runForMultipleInputsOutputs(inputArray, outputMap)
    lastInferenceTimeNanos = SystemClock.elapsedRealtimeNanos() - inferenceStartTimeNanos
    Log.i(
        "posenet",
        String.format("Interpreter took %.2f ms", 1.0f * lastInferenceTimeNanos / 1_000_000)
    )

    val heatmaps = outputMap[0] as Array<Array<Array<FloatArray>>>
    val offsets = outputMap[1] as Array<Array<Array<FloatArray>>>

    val height = heatmaps[0].size
    val width = heatmaps[0][0].size
    val numKeypoints = heatmaps[0][0][0].size

    // Finds the (row, col) locations of where the keypoints are most likely to be.
    val keypointPositions = Array(numKeypoints) { Pair(0, 0) }
    for (keypoint in 0 until numKeypoints) {
        var maxVal = heatmaps[0][0][0][keypoint]
        var maxRow = 0
        var maxCol = 0
        for (row in 0 until height) {
            for (col in 0 until width) {
                if (heatmaps[0][row][col][keypoint] > maxVal) {
                    maxVal = heatmaps[0][row][col][keypoint]
                    maxRow = row
                    maxCol = col
                }
            }
        }
        keypointPositions[keypoint] = Pair(maxRow, maxCol)
    }

    // Calculating the x and y coordinates of the keypoints with offset adjustment.
    val xCoords = IntArray(numKeypoints)
    val yCoords = IntArray(numKeypoints)
    val confidenceScores = FloatArray(numKeypoints)
    keypointPositions.forEachIndexed { idx, position ->
        val positionY = keypointPositions[idx].first
        val positionX = keypointPositions[idx].second
        yCoords[idx] = (
                position.first / (height - 1).toFloat() * bitmap.height +
                        offsets[0][positionY][positionX][idx]
                ).toInt()
        xCoords[idx] = (
                position.second / (width - 1).toFloat() * bitmap.width +
                        offsets[0][positionY]
                                [positionX][idx + numKeypoints]
                ).toInt()
        confidenceScores[idx] = sigmoid(heatmaps[0][positionY][positionX][idx])
    }

    val person = Person()
    val keypointList = Array(numKeypoints) { KeyPoint() }
    var totalScore = 0.0f
    enumValues<BodyPart>().forEachIndexed { idx, it ->
        keypointList[idx].bodyPart = it
        keypointList[idx].position.x = xCoords[idx]
        keypointList[idx].position.y = yCoords[idx]
        keypointList[idx].score = confidenceScores[idx]
        totalScore += confidenceScores[idx]
    }

    person.keyPoints = keypointList.toList()
    person.score = totalScore / numKeypoints

    return person
}

The whole .kt file is the heart of bitmap to points on screen!整个 .kt 文件是位图到屏幕点的核心!

If you need anything else tag me.如果你还有什么需要,标记我。

Happy coding快乐编码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM