减少误报的最佳策略：Google 在卫星图像上的新对象检测 API

Question

I'm setting up the new Tensorflow Object Detection API to find small objects in large areas of satellite imagery.我正在设置新的Tensorflow 对象检测 API，以在大面积的卫星图像中查找小对象。 It works quite well - it finds all 10 objects I want, but I also get 50-100 false positives [things that look a little like the target object, but aren't].它工作得很好 - 它找到了我想要的所有 10 个对象，但我也得到了 50-100 个误报 [看起来有点像目标对象的东西，但不是]。

I'm using the sample config from the 'pets' tutorial , to fine-tune the faster_rcnn_resnet101_coco model they offer.我正在使用“宠物”教程中的示例配置来微调他们提供的faster_rcnn_resnet101_coco模型。 I've started small, with only 100 training examples of my objects (just 1 class).我从很小的地方开始，只有 100 个对象的训练示例（只有 1 个类）。 50 examples in my validation set.我的验证集中有 50 个示例。 Each example is a 200x200 pixel image with a labeled object (~40x40) in the center.每个示例都是一个 200x200 像素的图像，中心有一个标记对象 (~40x40)。 I train until my precision & loss curves plateau.我一直训练到我的精度和损失曲线达到稳定水平。

I'm relatively new to using deep learning for object detection.我对使用深度学习进行对象检测比较陌生。 What is the best strategy to increase my precision?提高精度的最佳策略是什么？ eg Hard-negative mining?例如硬负挖掘？ Increase my training dataset size?增加我的训练数据集大小？ I've yet to try the most accurate model they offer faster_rcnn_inception_resnet_v2_atrous_coco as i'd like to maintain some speed, but will do so if needed.我还没有尝试过他们提供的最准确的模型faster_rcnn_inception_resnet_v2_atrous_coco因为我想保持一定的速度，但如果需要的话会这样做。

Hard-negative mining seems to be a logical step.硬负挖掘似乎是一个合乎逻辑的步骤。 If you agree, how do I implement it wrt setting up the tfrecord file for my training dataset?如果您同意，我该如何实现它并为我的训练数据集设置 tfrecord 文件？ Let's say I make 200x200 images for each of the 50-100 false positives:假设我为 50-100 个误报中的每一个制作了 200x200 图像：

Do I create 'annotation' xml files for each, with no 'object' element?我是否为每个创建“注释”xml 文件，而没有“对象”元素？
...or do I label these hard negatives as a second class? ...或者我是否将这些硬底片标记为第二类？
If I then have 100 negatives to 100 positives in my training set - is that a healthy ratio?如果我的训练集中有 100 个负数到 100 个正数 - 这是一个健康的比例吗？ How many negatives can I include?我可以包含多少个底片？

Answer 1

I've revisited this topic recently in my work and thought I'd update with my current learnings for any who visit in the future.我最近在我的工作中重新审视了这个话题，并认为我会更新我目前的学习情况，以供将来访问的任何人使用。

The topic appeared on Tensorflow's Models repo issue tracker .该主题出现在Tensorflow 的 Models repo issue tracker 上。 SSD allows you to set the ratio of how many negative:postive examples to mine ( max_negatives_per_positive: 3 ), but you can also set a minimum number for images with no postives ( min_negatives_per_image: 3 ). SSD 允许您设置我的负数：正例的比率（ max_negatives_per_positive: 3 ），但您也可以为没有正例的图像设置最小数量（ min_negatives_per_image: 3 ）。 Both of these are defined in the model-ssd-loss config section.这两个都在 model-ssd-loss 配置部分中定义。

That said, I don't see the same option in Faster-RCNN's model configuration.也就是说，我在 Faster-RCNN 的模型配置中没有看到相同的选项。 It's mentioned in the issue that models/research/object_detection/core/balanced_positive_negative_sampler.py contains the code used for Faster-RCNN.问题中提到models/research/object_detection/core/balanced_positive_negative_sampler.py包含用于Faster-RCNN的代码。

One other option discussed in the issue is creating a second class specifically for lookalikes.该问题中讨论的另一个选项是专门为相似对象创建第二个类。 During training, the model will attempt to learn class differences which should help serve your purpose.在训练期间，模型将尝试学习类别差异，这应该有助于达到您的目的。

Lastly, I came across this article on Filter Amplifier Networks (FAN) that may be informative for your work on aerial imagery.最后，我看到了这篇关于滤波器放大器网络 (FAN) 的文章，它可能对您在航拍图像方面的工作有所帮助。

=================================================================== ================================================== ==================

The following paper describes hard negative mining for the same purpose you describe: Training Region-based Object Detectors with Online Hard Example Mining以下论文描述了与您所描述的目的相同的硬负挖掘： Training Region-based Object Detectors with Online Hard Example Mining

In section 3.1 they describe using a foreground and background class:在 3.1 节中，他们描述了使用前景和背景类：

Background RoIs.背景投资回报率。 A region is labeled background (bg) if its maximum IoU with ground truth is in the interval [bg lo, 0.5).如果一个区域与地面实况的最大 IoU 在区间 [bg lo, 0.5] 内，则该区域被标记为背景 (bg)。 A lower threshold of bg lo = 0.1 is used by both FRCN and SPPnet, and is hypothesized in [14] to crudely approximate hard negative mining; FRCN 和 SPPnet 都使用 bg lo = 0.1 的较低阈值，并在 [14] 中假设以粗略地近似硬负挖掘； the assumption is that regions with some overlap with the ground truth are more likely to be the confusing or hard ones.假设是与基本事实有一些重叠的区域更有可能是令人困惑或困难的区域。 We show in Section 5.4 that although this heuristic helps convergence and detection accuracy, it is suboptimal because it ignores some infrequent, but important, difficult background regions.我们在 5.4 节中表明，虽然这种启发式有助于收敛和检测精度，但它是次优的，因为它忽略了一些不常见但重要的困难背景区域。 Our method removes the bg lo threshold.我们的方法去除了 bg lo 阈值。

In fact this paper is referenced and its ideas are used in Tensorflow's object detection losses.py code for hard mining:事实上，这篇论文被引用，其思想被用在 Tensorflow 的对象检测 loss.py 代码中用于硬挖掘：

class HardExampleMiner(object):
"""Hard example mining for regions in a list of images.
Implements hard example mining to select a subset of regions to be
back-propagated. For each image, selects the regions with highest losses,
subject to the condition that a newly selected region cannot have
an IOU > iou_threshold with any of the previously selected regions.
This can be achieved by re-using a greedy non-maximum suppression algorithm.
A constraint on the number of negatives mined per positive region can also be
enforced.
Reference papers: "Training Region-based Object Detectors with Online
Hard Example Mining" (CVPR 2016) by Srivastava et al., and
"SSD: Single Shot MultiBox Detector" (ECCV 2016) by Liu et al.
"""

Based on your model config file, the HardMinerObject is returned by losses_builder.py in this bit of code:根据您的模型配置文件，HardMinerObject 由 loss_builder.py 在以下代码中返回：

def build_hard_example_miner(config,
                            classification_weight,
                            localization_weight):
"""Builds hard example miner based on the config.
Args:
    config: A losses_pb2.HardExampleMiner object.
    classification_weight: Classification loss weight.
    localization_weight: Localization loss weight.
Returns:
    Hard example miner.
"""
loss_type = None
if config.loss_type == losses_pb2.HardExampleMiner.BOTH:
    loss_type = 'both'
if config.loss_type == losses_pb2.HardExampleMiner.CLASSIFICATION:
    loss_type = 'cls'
if config.loss_type == losses_pb2.HardExampleMiner.LOCALIZATION:
    loss_type = 'loc'

max_negatives_per_positive = None
num_hard_examples = None
if config.max_negatives_per_positive > 0:
    max_negatives_per_positive = config.max_negatives_per_positive
if config.num_hard_examples > 0:
    num_hard_examples = config.num_hard_examples
hard_example_miner = losses.HardExampleMiner(
    num_hard_examples=num_hard_examples,
    iou_threshold=config.iou_threshold,
    loss_type=loss_type,
    cls_loss_weight=classification_weight,
    loc_loss_weight=localization_weight,
    max_negatives_per_positive=max_negatives_per_positive,
    min_negatives_per_image=config.min_negatives_per_image)
return hard_example_miner

which is returned by model_builder.py and called by train.py.由 model_builder.py 返回并由 train.py 调用。 So basically, it seems to me that simply generating your true positive labels (with a tool like LabelImg or RectLabel) should be enough for the train algorithm to find hard negatives within the same images.所以基本上，在我看来，简单地生成真正的正标签（使用 LabelImg 或 RectLabel 之类的工具）应该足以让训练算法在相同的图像中找到硬底片。 The related question gives an excellent walkthrough .相关问题提供了一个很好的演练。

In the event you want to feed in data that has no true positives (ie nothing should be classified in the image), just add the negative image to your tfrecord with no bounding boxes.如果您想输入没有真正正数的数据（即图像中不应分类任何内容），只需将负数图像添加到您的 tfrecord 中，没有边界框。

Answer 2

I think I was passing through the same or close scenario and it's worth it to share with you.我想我正在经历相同或接近的场景，值得与您分享。

I managed to solve it by passing images without annotations to the trainer.我设法通过将没有注释的图像传递给训练器来解决它。

On my scenario I'm building a project to detect assembly failures from my client's products, at real time.在我的场景中，我正在构建一个项目来实时检测客户产品的装配故障。 I successfully achieved very robust results (for production env) by using detection+classification for components that has explicity a negative pattern (eg a screw that has screw on/off(just the hole)) and only detection for things that doesn't has the negative pattens (eg a tape that can be placed anywhere).我通过对具有明显负面模式的组件（例如有螺丝打开/关闭的螺丝（只有孔））使用检测+分类并仅检测没有的东西，成功地获得了非常强大的结果（对于生产环境）底片（例如可以放置在任何地方的胶带）。

On the system it's mandatory that the user record 2 videos, one containing the positive scenario and another containing the negative (or the n videos, containing n patterns of positive and negative so the algorithm can generalize).在系统上，用户必须录制 2 个视频，一个包含正面场景，另一个包含负面场景（或 n 个视频，包含 n 个正面和负面模式，以便算法可以概括）。

After a while testing I found out that if I register to detected only tape the detector was giving very confident (0.999) false positive detections of tape.经过一段时间的测试，我发现如果我注册仅检测磁带，则检测器会给出非常自信的 (0.999) 磁带误报检测。 It was learning the pattern where the tape was inserted instead of the tape itself.它正在学习插入磁带而不是磁带本身的模式。 When I had another component (like a screw on it's negative format) I was passing the negative pattern of tape without being explicitly aware of it, so the FPs didn't happen.当我有另一个组件（如负片上的螺丝）时，我在没有明确意识到的情况下传递了胶带的负片图案，因此 FP 没有发生。

So I found out that, in this scenario, I had to necessarily pass the images without tape so it could differentiate between tape and no-tape.所以我发现，在这种情况下，我必须在没有磁带的情况下传递图像，以便区分磁带和无磁带。

I considered two alternatives to experiment and try to solve this behavior:我考虑了两种实验方法并尝试解决此行为：

Train passing an considerable amount of images that doesn't has any annotation (10% of all my negative samples) along with all images that I have real annotations.训练传递大量没有任何注释的图像（占我所有负样本的 10%）以及我有真实注释的所有图像。
On the images that I don't have annotation I create a dummy annotation with a dummy label so I could force the detector to train with that image (thus learning the no-tape patttern).在我没有注释的图像上，我创建了一个带有虚拟标签的虚拟注释，这样我就可以强制检测器使用该图像进行训练（从而学习无磁带模式）。 Later on, when get the dummy predictions, just ignore them.稍后，当获得虚拟预测时，只需忽略它们。

Concluded that both alternatives worked perfectly on my scenario.得出的结论是，这两种选择都在我的场景中完美运行。 The training loss got a little messy but the predictions work with robustness for my very controlled scenario (the system's camera has its own box and illumination to decrease variables).训练损失有点混乱，但对于我非常可控的场景（系统的相机有自己的盒子和照明以减少变量），预测工作具有鲁棒性。

I had to make two little modifications for the first alternative to work:我不得不为第一个工作替代方案做两个小的修改：

All images that didn't had any annotation I passed a dummy annotation (class=None, xmin/ymin/xmax/ymax=-1)所有没有任何注释的图像我都通过了一个虚拟注释（class=None，xmin/ymin/xmax/ymax=-1）
When generating the tfrecord files I use this information (xmin == -1, in this case) to add an empty list for the sample:生成 tfrecord 文件时，我使用此信息（在本例中为 xmin == -1）为示例添加一个空列表：

def create_tf_example(group, path, label_map):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'

    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        if not pd.isnull(row.xmin):
            if not row.xmin == -1:
                xmins.append(row['xmin'] / width)
                xmaxs.append(row['xmax'] / width)
                ymins.append(row['ymin'] / height)
                ymaxs.append(row['ymax'] / height)
                classes_text.append(row['class'].encode('utf8'))
                classes.append(label_map[row['class']])

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example

Part of the traning progress:部分培训进度：

Currently I'm using tensorflow object detection along with tensorflow==1.15, using faster_rcnn_resnet101_coco.config.目前我正在使用 tensorflow 对象检测和 tensorflow==1.15，使用 fast_rcnn_resnet101_coco.config。

Hope it will solve someone's problem as I didn't found any solution on the internet.希望它能解决某人的问题，因为我没有在互联网上找到任何解决方案。 I read a lot of people telling that faster_rcnn is not adapted for negative training for FPs reduction but my tests proved the opposite.我读到很多人说 fast_rcnn 不适用于减少 FP 的负面训练，但我的测试证明相反。

减少误报的最佳策略：Google 在卫星图像上的新对象检测 API

问题描述

2 个解决方案

解决方案1
16 2017-08-14 23:57:52

解决方案2
2 2021-01-19 20:01:30

减少误报的最佳策略：Google 在卫星图像上的新对象检测 API

问题描述

2 个解决方案

解决方案1 16 2017-08-14 23:57:52

解决方案2 2 2021-01-19 20:01:30

解决方案1
16 2017-08-14 23:57:52

解决方案2
2 2021-01-19 20:01:30