如何准备我的图像和注释以进行视网膜网训练？

Question

I followed this tutorial for training object detection model on coco dataset. 我按照本教程培训了可可数据集上的对象检测模型。 The tutorial contains a step to download and use coco dataset and its annotations and convert them to TFRecord . 本教程包含下载和使用coco数据集及其注释并将其转换为TFRecord的步骤 。

I need to use my own custom data to train, i annotated using labelimg tool which produced xml files containing (w,h,xmin,ymin,xmax,ymax) for images. 我需要使用自己的自定义数据进行训练，我使用labelimg工具进行了注释，该工具生成了包含图像的（w，h，xmin，ymin，xmax，ymax）的xml文件。

But coco dataset has JSON format with Image segmentation fields for creating a TFRecord . 但是可可数据集具有JSON格式，带有用于创建TFRecord的图像分割字段。

Is segmentation mandatory for training resnet, retinanet? 细分是训练Resnet，Retinanet所必需的吗？

So, can anyone guide me a procedure for creating a JSON annotation from my XML annotations without segmentation values? 因此，有人可以指导我一个从我的XML批注创建不带分段值的JSON批注的过程吗？

xml: XML：

<annotation>
    <folder>frames</folder>
    <filename>83.jpg</filename>
    <path>/home/tdadmin/Downloads/large/f/frames/83.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>640</width>
        <height>480</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>person</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>246</xmin>
            <ymin>48</ymin>
            <xmax>350</xmax>
            <ymax>165</ymax>
        </bndbox>
    </object>
</annotation>

Answer 1

What you are doing now is kind of similar to a project I've done before. 您现在正在做的事情类似于我之前完成的项目。 So I have some suggestions for you. 因此，我为您提供一些建议。

When I was training my Mask RCNN model, I used VGG Image Annotator ( you can easily find that on Google ). 在训练Mask RCNN模型时，我使用了VGG Image Annotator（您可以在Google上轻松找到它）。 By using that tool, it's easy to create json annotation files. 通过使用该工具，可以轻松创建json注释文件。 Then plug that in your training. 然后将其插入您的训练中。

Hope that would help you. 希望对您有所帮助。 Feel free to comment on this if you still have questions. 如果您仍有疑问，请随时对此发表评论。

Rowen 罗文

Answer 2

The annotation format actually doesn't matter. 注释格式实际上并不重要。 I have myself created tfrecord from txt files before. 我以前从txt文件创建过tfrecord。 To create custom tfrecord you would have to write your own create_custom_tf_record.py just as others shown in this folder . 要创建自定义tfrecord，您必须编写自己的create_custom_tf_record.py就像该文件夹中显示的其他一样。

But since you are using coco similar annotations, you can make use of the file create_coco_tf_record.py . 但是，由于您使用的是类似可可的注释，因此可以使用文件create_coco_tf_record.py 。 The important thing you need to implement yourself is the annotations_list . 您需要自己实现的重要事情是annotations_list 。 The annotations_list is just a dictionary so your goal is to parse your xml file into a dictionary containing key value pairs, and then pass the correct value to the feature_dict , then construct tf.train.Example from the feature_dict . annotations_list只是一个字典，因此您的目标是将xml文件解析为包含键值对的字典，然后将正确的值传递给feature_dict ，然后从feature_dict构造tf.train.Example 。 Once you have the tf.train.Example created , you can create tfrecord easily. 一旦tf.train.Example created了tf.train.Example created ，就可以轻松创建tfrecord。

So for your exact example, first parse the xml file. 因此，对于您的确切示例，首先解析xml文件。

import xml.etree.ElementTree as ET
tree = ET.parse('annotations.xml')

Then construct annotaions_list from tree like this: 然后像这样从tree构造annotaions_list ：

annotations_list = {}
it = tree.iter()
for key in it:
    annotations_list[str(key.tag)] = key.text

Then you can create the feature_dict from the annotations_list 然后，您可以从annotations_list创建feature_dict

feature_dict = {
  'image/height':
      dataset_util.int64_feature(annotatios_list['height']),
  'image/width':
      dataset_util.int64_feature(...),
  'image/filename':
      dataset_util.bytes_feature(...),
  'image/source_id':
      dataset_util.bytes_feature(...),
  'image/key/sha256':
      dataset_util.bytes_feature(...),
  'image/encoded':
      dataset_util.bytes_feature(...),
  'image/format':
      dataset_util.bytes_feature(...),
  'image/object/bbox/xmin':
      dataset_util.float_list_feature(...),
  'image/object/bbox/xmax':
      dataset_util.float_list_feature(...),
  'image/object/bbox/ymin':
      dataset_util.float_list_feature(...),
  'image/object/bbox/ymax':
      dataset_util.float_list_feature(...),
  'image/object/class/text':
      dataset_util.bytes_list_feature(....),
  'image/object/is_crowd':
      dataset_util.int64_list_feature(...),
  'image/object/area':
      dataset_util.float_list_feature(...),
  }

Just need to make sure the feature_dict filed corresponds to the correct field from the annotations_list and the label_map . 只需确保feature_dict字段对应于annotations_list和label_map的正确字段label_map 。

You may wonder why exactly these fields in the feature_dict is necessary, according to the official documentation using your own dataset , the following fileds are necessary and others are optional. 您可能会奇怪，为什么根据使用您自己的数据集的官方文档，在feature_dict确实需要这些字段，所以以下字段是必需的，而其他字段是可选的。

'image/height': dataset_util.int64_feature(height),
  'image/width': dataset_util.int64_feature(width),
  'image/filename': dataset_util.bytes_feature(filename),
  'image/source_id': dataset_util.bytes_feature(filename),
  'image/encoded': dataset_util.bytes_feature(encoded_image_data),
  'image/format': dataset_util.bytes_feature(image_format),
  'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
  'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
  'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
  'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
  'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
  'image/object/class/label': dataset_util.int64_list_feature(classes),

如何准备我的图像和注释以进行视网膜网训练？

问题描述

2 个解决方案

解决方案1
1 2019-03-07 00:37:54

解决方案2
0 2019-03-06 14:45:13

如何准备我的图像和注释以进行视网膜网训练？

问题描述

2 个解决方案

解决方案1 1 2019-03-07 00:37:54

解决方案2 0 2019-03-06 14:45:13

解决方案1
1 2019-03-07 00:37:54

解决方案2
0 2019-03-06 14:45:13