简体   繁体   English

将像素坐标转换为帧坐标

[英]Convert pixel coordinates to frame coordinates

I am using a small window to detect Mario which is represented by a red block. 我正在使用一个小窗口来检测由红色方块表示的Mario。 However, this red block is composed of 16 by 12 pixels. 但是,此红色块由16 x 12像素组成。 I want to take the pixel coordinates I found, and convert this to a normal x/y coordinate system based on the window shown in the image: Actual frame which should be 13 by 16 grid (NOT pixels). 我想获取找到的像素坐标,然后根据图像中显示的窗口将此坐标转换为普通的x / y坐标系: 实际帧应为13 x 16网格(无像素)。

So for example, if Mario box is in the upper left corner of screen, the coordinates should be 0,0. 因此,例如,如果Mario框位于屏幕的左上角,则坐标应为0,0。

I'm also not sure how to actually make the grid. 我也不确定如何实际制作网格。

The code I'm using is as follows: 我使用的代码如下:

import numpy as np
from PIL import Image


class MarioPixels:

def __init__(self):
    self.mario = np.array([

        [[248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0],
         [248, 56, 0]
         ]]
    )

    self.height = len(self.mario)  # specify number of pixels for columns in the frame
    self.width = len(self.mario[0])  # specificy number of pixels representing a line in the frame

    print(self.mario.shape)

# find difference in R, G and B values between what's in window and what's on the frame
def pixelDiff(self, p1, p2):
    return abs(p1[0] - p2[0]), abs(p1[1] - p2[1]), abs(p1[2] - p2[2])

def isMario(self, window, pattern):
    total = [0, 0, 0]
    count = 0
    for line in range(len(pattern)):

        lineItem = pattern[line]
        sample = window[line]

        for pixelIdx in range(len(lineItem)):
            count += 1
            pixel1 = lineItem[pixelIdx]
            pixel2 = sample[pixelIdx]
            d1, d2, d3 = self.pixelDiff(pixel1, pixel2)
            # print(pixelIdx)
            total[0] = total[0] + d1  # sum of difference between all R values found between window and frame
            total[1] = total[1] + d2  # sum of difference between all G values found between window and frame
            total[2] = total[2] + d3  # sum of difference between all B values found between window and frame
            # Mario has a red hat
            # if line == 0 and pixelIdx == 4 and pixel2[0] != 248:
            #    return 1.0

    rscore = total[0] / (
                count * 255)  # divided by count of all possible places the R difference could be calculated
    gscore = total[1] / (
                count * 255)  # divided by count of all possible places the G difference could be calculated
    bscore = total[2] / (
                count * 255)  # divided by count of all possible places the B difference could be calculated

    return (
                       rscore + gscore + bscore) / 3.0  # averaged to find a value between 0 and 1. Num close to 0 means object(mario, pipe, etc.) is there,
    # whereas, number close to 1 means object was not found.

def searchForMario(self, step, state, pattern):

    height = self.height
    width = self.width

    x1 = 0
    y1 = 0
    x2 = width
    y2 = height

    imageIdx = 0
    bestScore = 1.1
    bestImage = None
    bestx1, bestx2, besty1, besty2 = 0, 0, 0, 0

    for y1 in range(0, 240 - height, 8):  # steps in range row, jump by 8 rows
        y2 = y1 + height

        for x1 in range(0, 256 - width, 3):  # jump by 3 columns
            x2 = x1 + width

            window = state[y1:y2, x1:x2, :]
            score = self.isMario(window, pattern)
            # print(imageIdx, score)
            if score < bestScore:
                bestScore = score
                bestImageIdx = imageIdx
                bestImage = Image.fromarray(window)
                bestx1, bestx2, besty1, besty2 = x1, x2, y1, y2

            imageIdx += 1

    bestImage.save('testrgb' + str(step) + '_' + str(bestImageIdx) + '_' + str(bestScore) + '.png')

    return bestx1, bestx2, besty1, besty2

It looks like you've got a pixel aspect ratio at play here, so the width and height of each "block" in pixels will be different. 看起来您在这里有一个像素长宽比,因此每个“块”的宽度和高度(以像素为单位)将有所不同。

Going by your code, your pixel space is 256x240 pixels, but you say that it actually represents a 13x16 grid. 按照您的代码,您的像素空间为256x240像素,但是您说它实际上代表了13x16的网格。 This means that every block in the x-domain is (256/13) or about 20 pixels, and in the y-domain (240/16) 15 pixels. 这意味着,x域中的每个块为(256/13)或大约20个像素,y域中的每个块为(240/16)15个像素。 This means that "Mario", at 16x12 pixels occupies less than one complete block. 这意味着16x12像素的“ Mario”占用的块少于一个完整块。 Looking at your image, this seems to be a possibility - bushes and clouds also occupy less than one block. 查看您的图像,这似乎是有可能的-灌木丛和云层也占据不到一个街区。

I suggest you first make sure the 13x16 grid is correct (simply because it doesn't seem to match your pixel size exactly, and because the stride sizes in your ranges imply that blocks might actually be 3x8 pixels). 我建议您首先确保13x16网格正确(这是因为它似乎与您的像素大小不完全匹配,并且因为您范围内的步幅大小暗示块实际上可能是3x8像素)。 Then, you can try to add the grid on to the pixel image simply by setting the value of every pixel that has an x-co-ordinate exactly divisible by 20 equal to (0,0,0) for a black RGB pixel (and also a y-coordinate exactly divisible by 15 - use modulus operator %). 然后,您可以尝试通过简单地设置每个像素的x坐标精确地被20整除等于黑色RGB像素的(0,0,0)来将网格添加到像素图像上。 y坐标也可以被15整除-使用模数运算符%)。 To get the "block" co-ordinates, simply divide the x-co by 20 and the y-co by 15 and round down to the nearest whole number (or use // to do the rounding as part of the division). 要获得“块”坐标,只需将x-co除以20,将y-co除以15,然后四舍五入为最接近的整数(或使用//进行四舍五入作为除法的一部分)。

I've assumed that your pixel co-ordinates also run from top left (0,0) to bottom right (256, 240). 我假设您的像素坐标也从左上角(0,0)到右下角(256、240)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM