I am trying to read water meter reading through OCR, however, my first step is to find ROI. I found a dataset from Kaggle with the labeled data for the ROI. But they are not in rectangles, rather in a polygon shape, some with 5 points, and some with 8 depending on the image. How do I convert this to yolo format? For example:
file name | value | coordinates
id_53_value_595_825.jpg 595.825 {'type': 'polygon', 'data': [{'x': 0.30788, 'y': 0.30207}, {'x': 0.30676, 'y': 0.32731}, {'x': 0.53501, 'y': 0.33068}, {'x': 0.53445, 'y': 0.33699}, {'x': 0.56529, 'y': 0.33741}, {'x': 0.56697, 'y': 0.29786}, {'x': 0.53501, 'y': 0.29786}, {'x': 0.53445, 'y': 0.30417}]}
id_553_value_65_475.jpg 65.475 {'type': 'polygon', 'data': [{'x': 0.26133, 'y': 0.24071}, {'x': 0.31405, 'y': 0.23473}, {'x': 0.31741, 'y': 0.26688}, {'x': 0.30676, 'y': 0.26763}, {'x': 0.33985, 'y': 0.60851}, {'x': 0.29386, 'y': 0.61449}]}
id_407_value_21_86.jpg 21.86 {'type': 'polygon', 'data': [{'x': 0.27545, 'y': 0.19134}, {'x': 0.37483, 'y': 0.18282}, {'x': 0.38935, 'y': 0.76071}, {'x': 0.28185, 'y': 0.76613}]}
I understood that for yolo format, I need to get xmin, ymin, xmax, ymax so that i can calculate the width and height but i have trouble with parsing the data. Could anyone help?
Thank you.
Edit: Finally, it worked out. Incase anyone is struggling with converting csv file to yolo format from https://www.kaggle.com/datasets/tapakah68/yandextoloka-water-meters-dataset , here is my code snipet to just create text files for each image.
import csv
import pandas as pd
import json
import ast
def converttoyolo(csv_file):
df = pd.read_csv(csv_file)
l_csv = len(df)
for i in range(l_csv):
df_row = df.iloc[i] #get each row
df_ = df_row['photo_name'] #image column
df__ = df_.split('.') #to get name for text file
df_new = df_row['location'] #start of gettinf coordinates access
df_dict = ast.literal_eval(df_new) #str to dict
df__dict = json.dumps(df_dict, indent = 4)
df_dict__ = json.loads(df__dict)
convertedDict = df_dict__
length = len(convertedDict['data'])
x = []
y = []
for j in range(length): #put each x and y for each row in seperate array
x.append(convertedDict['data'][j]['x'])
y.append(convertedDict['data'][j]['y'])
max_x = max(x)
max_y = max(y) #yolo conversion, check answer below
min_x = min(x)
min_y = min(y)
width = max_x - min_x
height = max_y - min_y
center_x = min_x + (width/2)
center_y = min_y + (height/2)
def filename(file): #put in text files
with open(file+".txt", "w") as file:
file.write(str(width)+','+str(height)+','+ ...
str(center_y)+','+str(center_y))
filename('/content/drive/MyDrive/yolo/custom_data/jpeg/'+df__[0])
converttoyolo(csv_file)
You need to create a contour (a list of points) for each shape. Once you have that, then call cv::boundingRect()
to turn each contour into aa single bounding rectangle. Once you have the rectangle, then you you can figure out X, Y, W, and H. But since YOLO format is CX and CY -- not X and Y -- then you need to do:
CX = X + W/2.0
CY = Y + H/2.0
Lastly, you must normalize all 4 values. The YOLO format is space delimited, and the first value is the integer class ID. So if "dog" is your 2nd class (thus id #1 since it is zero-based), then you'd output:
1 0.234 0.456 0.123 0.111
...where the 4 coordinates are:
CX / image width
CY / image height
W / image width
H / image height
If you want more examples of the math, see the Dar.net/YOLO FAQ: https://www.ccoderun.ca/programming/dar.net_faq/#dar.net_annotations
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.