简体   繁体   中英

How to convert polygon coordinates to rectangle(yolo format) for an image labelling?

I am trying to read water meter reading through OCR, however, my first step is to find ROI. I found a dataset from Kaggle with the labeled data for the ROI. But they are not in rectangles, rather in a polygon shape, some with 5 points, and some with 8 depending on the image. How do I convert this to yolo format? For example:

file name | value | coordinates

id_53_value_595_825.jpg 595.825 {'type': 'polygon', 'data': [{'x': 0.30788, 'y': 0.30207}, {'x': 0.30676, 'y': 0.32731}, {'x': 0.53501, 'y': 0.33068}, {'x': 0.53445, 'y': 0.33699}, {'x': 0.56529, 'y': 0.33741}, {'x': 0.56697, 'y': 0.29786}, {'x': 0.53501, 'y': 0.29786}, {'x': 0.53445, 'y': 0.30417}]}

id_553_value_65_475.jpg 65.475 {'type': 'polygon', 'data': [{'x': 0.26133, 'y': 0.24071}, {'x': 0.31405, 'y': 0.23473}, {'x': 0.31741, 'y': 0.26688}, {'x': 0.30676, 'y': 0.26763}, {'x': 0.33985, 'y': 0.60851}, {'x': 0.29386, 'y': 0.61449}]}

id_407_value_21_86.jpg 21.86 {'type': 'polygon', 'data': [{'x': 0.27545, 'y': 0.19134}, {'x': 0.37483, 'y': 0.18282}, {'x': 0.38935, 'y': 0.76071}, {'x': 0.28185, 'y': 0.76613}]}

I understood that for yolo format, I need to get xmin, ymin, xmax, ymax so that i can calculate the width and height but i have trouble with parsing the data. Could anyone help?

Thank you.

Edit: Finally, it worked out. Incase anyone is struggling with converting csv file to yolo format from https://www.kaggle.com/datasets/tapakah68/yandextoloka-water-meters-dataset , here is my code snipet to just create text files for each image.

import csv
import pandas as pd
import json
import ast
def converttoyolo(csv_file):
  df = pd.read_csv(csv_file)
  l_csv = len(df)
  for i in range(l_csv):
    df_row = df.iloc[i]  #get each row
    
    df_ = df_row['photo_name']  #image column 
    df__ = df_.split('.')  #to get name for text file

    df_new = df_row['location']  #start of gettinf coordinates access
    df_dict = ast.literal_eval(df_new)  #str to dict
    df__dict = json.dumps(df_dict, indent = 4)
    df_dict__ = json.loads(df__dict)
    
    convertedDict = df_dict__
    length = len(convertedDict['data'])
    x = []
    y = []
    for j in range(length):  #put each x and y for each row in seperate array
      x.append(convertedDict['data'][j]['x'])
      y.append(convertedDict['data'][j]['y'])

    max_x = max(x)
    max_y = max(y)  #yolo conversion, check answer below
    min_x = min(x)
    min_y = min(y)

    width = max_x - min_x
    height = max_y - min_y 
    center_x = min_x + (width/2)
    center_y = min_y + (height/2)

    def filename(file):   #put in text files
      with open(file+".txt", "w") as file:
        
         file.write(str(width)+','+str(height)+','+ ...
         str(center_y)+','+str(center_y))
    
    filename('/content/drive/MyDrive/yolo/custom_data/jpeg/'+df__[0]) 
    
converttoyolo(csv_file)

You need to create a contour (a list of points) for each shape. Once you have that, then call cv::boundingRect() to turn each contour into aa single bounding rectangle. Once you have the rectangle, then you you can figure out X, Y, W, and H. But since YOLO format is CX and CY -- not X and Y -- then you need to do:

CX = X + W/2.0
CY = Y + H/2.0

Lastly, you must normalize all 4 values. The YOLO format is space delimited, and the first value is the integer class ID. So if "dog" is your 2nd class (thus id #1 since it is zero-based), then you'd output:

1 0.234 0.456 0.123 0.111

...where the 4 coordinates are:

CX / image width
CY / image height
W / image width
H / image height

If you want more examples of the math, see the Dar.net/YOLO FAQ: https://www.ccoderun.ca/programming/dar.net_faq/#dar.net_annotations

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM