在一张图像中多次出现相同的 label - Tensorflow

Question

I'm trying to create a tf model, which can detect any handwriting in any image.我正在尝试创建一个 tf model，它可以检测任何图像中的任何笔迹。 In order to do that, i made the labels in all train pictures with just one label: edit.为了做到这一点，我只用一个 label: edit 在所有火车图片中制作了标签。 It means, one image can have this labels many times.这意味着，一张图像可以多次使用此标签。

After many hours of training using cpu i did't get the expected result.经过数小时的使用 cpu 训练后，我没有得到预期的结果。 The model can't see any of the blocks i gave before training. model 看不到我训练前给的任何积木。

I'm using the following model:我正在使用以下 model：

http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d6_coco17_tpu-32.tar.gz

Is the problem that i'm labeling one image with one label multiple times?问题是我多次用一个 label 标记一张图像吗？

Could be the problem of using cpu instead of gpu?可能是使用 cpu 而不是 gpu 的问题？ I have currently one gpu with 4gb and it seems not enough.我目前有一个 4GB 的 gpu，但似乎不够用。 i trained the model with 2000 steps and learning_rate was 0.006.我用 2000 步训练了 model，learning_rate 是 0.006。 should i train it to be more than that?我应该训练它不止于此吗？

Any suggestions?有什么建议么？

Thank you in advanced.提前谢谢你。

Edit编辑

Following the is a screenshot from tensorboard of the trained model:以下是经过训练的 model 的 tensorboard 的屏幕截图：

Answer 1

CPU vs GPU CPU 与 GPU

GPU has the only advantage that the training is faster. GPU 唯一的优势就是训练速度更快。 It shouldn't have any effect on the expected result.它不应该对预期结果有任何影响。 It just takes longer.只是需要更长的时间。 Though, for some models, the difference could be large.但是，对于某些型号，差异可能很大。 Monitoring your training might give you more insight.监控你的训练可能会给你更多的洞察力。

Training monitoring训练监控

What does it mean you did not get the expected result?没有得到预期的结果是什么意思？ How was it different from what you expected?它与您的预期有何不同？

I suggest you use some kind of monitoring, such as Tensorboard to monitor both the loss and metrics of the training and validation dataset if you do not already do so.我建议您使用某种监控，例如 Tensorboard 来监控训练和验证数据集的损失和指标（如果您还没有这样做的话）。 This will give you invaluable information about the training in real-time.这将为您提供有关实时培训的宝贵信息。

Pipeline debugging流水线调试

When your model seems not to be learning anything, you must start debugging.当你的model好像学不到什么东西的时候，你就要开始调试了。 You can follow the following steps to make sure that none of those is your problem: https://blog.slavv.com/37-reasons-why-your-neural.network-is-not-working-4020854bd607您可以按照以下步骤来确保这些都不是您的问题： https://blog.slavv.com/37-reasons-why-your-neural.network-is-not-working-4020854bd607

I especially like to overfit the model on a single batch.我特别喜欢在单个批次上过拟合 model。 This test tells me whether my algorithmic pipeline is correct, including preprocessing, the model, and the evaluation.这个测试告诉我我的算法管道是否正确，包括预处理、model 和评估。 The result should be that evaluation on the single training batch should give you a very good score, while for the rest of the data the score will be poor since the model will be overfitted on the single batch.结果应该是对单个训练批次的评估应该给你一个很好的分数，而对于数据的 rest，分数会很差，因为 model 将在单个批次上过度拟合。

Problem definition问题定义

Sometimes, it is possible that the problem is wrongly defined.有时，问题可能定义错误。 Can you clarify what your labels are?你能澄清一下你的标签是什么吗？ I do not fully understand.我不完全明白。 If it is a common problem, then you can search the inte.net to see the inspiration for how it is usually defined.如果它是一个常见问题，那么你可以搜索 inte.net 以查看通常如何定义的灵感。

EDIT编辑

Loss functions损失函数

In general, you want all your loss functions to go down.一般来说，你希望你所有的损失函数都下降到 go。 In your case, they go up in the first few hundred steps, which might not necessarily indicate that there's something wrong because sometimes it takes a short time in the beginning before the training stabilizes.在您的情况下，他们 go 在前几百步中上升，这不一定表示有问题，因为有时在训练稳定之前需要很短的时间。 N.netheless, the triangles indicate that the loss was NaN. N.尽管如此，三角形表明损失是 NaN。 That means that there is something wrong.这意味着有问题。 I can recommend using the tf.keras.callbacks.TerminateOnNaN callback to detect the NaNs in your loss immediately, which will terminate your training promptly.我可以推荐使用tf.keras.callbacks.TerminateOnNaN回调来立即检测损失中的 NaN，这将立即终止您的训练。

Metrics指标

From the loss functions themselves, it hard to tell what is the model's performance.从损失函数本身，很难判断模型的性能如何。 In every machine learning task, you have to be able to understand the performance of a model. Metrics are used exactly for that purpose.在每个机器学习任务中，您都必须能够理解 model 的性能。指标正是用于此目的。 In this case (object detection), I suggest using IoU to determine how well the predicted box overlaps with the target one, and precision and recall for evaluating the performance of your binary classification of whether the predicted box contains hand-written text or not.在这种情况下（对象检测），我建议使用IoU来确定预测框与目标框的重叠程度，并使用precision和recall来评估预测框是否包含手写文本的二元分类性能。

在一张图像中多次出现相同的 label - Tensorflow

问题描述

Edit编辑

1 个解决方案

解决方案1
0 2022-11-22 08:12:15

CPU vs GPU CPU 与 GPU

Training monitoring训练监控

Pipeline debugging流水线调试

Problem definition问题定义

EDIT编辑

Loss functions损失函数

Metrics指标

在一张图像中多次出现相同的 label - Tensorflow

问题描述

Edit编辑

1 个解决方案

解决方案1 0 2022-11-22 08:12:15

CPU vs GPU CPU 与 GPU

Training monitoring训练监控

Pipeline debugging流水线调试

Problem definition问题定义

EDIT编辑

Loss functions损失函数

Metrics指标

解决方案1
0 2022-11-22 08:12:15