边界框回归

Question

I generated a data-set of (200 x 200x 3) images in which each image contains a 40 X 40 box of different color.我生成了一个 (200 x 200x 3) 图像的数据集，其中每个图像包含一个 40 X 40 的不同颜色框。 Create a model using tensorflow which can predict coords of this 40 x 40 box.使用 tensorflow 创建一个模型，它可以预测这个 40 x 40 框的坐标。 enter image description here在此处输入图像描述

The code i used for generating these images:我用于生成这些图像的代码：


from PIL import Image, ImageDraw
from random import randrange

colors = ["#ffd615", "#f9ff21", "#00d1ff", 
"#0e153a", "#fc5c9c", "#ac3f21",
"#40514e", "#492540", "#ff8a5c",
"#000000", "#a6fff2", "#f0f696",
"#d72323", "#dee1ec", "#fcb1b1"]

def genrate_image(color):
    img = Image.new(mode="RGB", size=(200, 200), color=color)
    return img

def save_image(img, imgname):
    img.save(imgname)

def draw_rect(image, color, x, y):
    draw = ImageDraw.Draw(image)
    coords = ((x, y), (x+40, y), (x+40, y+40), (x, y+40))
    draw.polygon(coords, fill=color)
    #return image, str(coords)
    return image, coords[0][0], coords[2][0], coords[0][1], coords[2][1]

FILE_NAME = "train_annotations.txt"

for i in range(0, 100):
    img = genrate_image(colors[randrange(0, len(colors))])
    img, x0, x1, y0, y1 = draw_rect(img, colors[randrange(0, len(colors))], randrange(200 - 50), randrange(200 - 50))
    save_image(img, "dataset/train_images/img"+str(i)+".png")
    with open(FILE_NAME, "a+") as f:
        f.write(f"{x0} {x1} {y0} {y1}\n")
        f.close()

can anyone help me by suggesting how can i build a model which can predict coords of a new image.任何人都可以通过建议我如何构建可以预测新图像坐标的模型来帮助我。

Answer 1

Well the easiest way you can split these boxes is by doing a K-means clustering where K is 2. So you basically record all the rgb pixel values of the pixels.拆分这些框的最简单方法是执行 K 均值聚类，其中 K 为 2。因此，您基本上记录了像素的所有 rgb 像素值。 Then using K-means group up the pixels into 2 groups, one would be the background group, the other being the box color group.然后使用 K-means 将像素分为两组，一组是背景组，另一组是框颜色组。 Then with the box color group, map those colors back to their original coordinates.然后使用框颜色组，将这些颜色映射回它们的原始坐标。 Then get the mean of those coordinates to get the location of the 40x40 box.然后获取这些坐标的平均值以获得 40x40 框的位置。

https://www.tensorflow.org/api_docs/python/tf/compat/v1/estimator/experimental/KMeans Above is a source documentation on how to do K-means https://www.tensorflow.org/api_docs/python/tf/compat/v1/estimator/experimental/KMeans以上是关于如何进行 K-means 的源文档

Answer 2

It is enough to perform a bounding box regression, for this you just need to add a fully connected layer after СNN with 4 output values:x1,y1,x2,y2.执行边界框回归就足够了，为此您只需要在具有 4 个输出值的 СNN 之后添加一个全连接层：x1,y1,x2,y2。 where they are top left and bottom right.他们在左上角和右下角。 Something similar can be found here https://github.com/sabhatina/bounding-box-regression .可以在这里找到类似的东西https://github.com/sabhatina/bounding-box-regression 。

边界框回归

问题描述

2 个解决方案

解决方案1
0 2022-12-23 15:28:45

解决方案2
0 2022-12-24 09:41:12

边界框回归

问题描述

2 个解决方案

解决方案1 0 2022-12-23 15:28:45

解决方案2 0 2022-12-24 09:41:12

解决方案1
0 2022-12-23 15:28:45

解决方案2
0 2022-12-24 09:41:12