简体   繁体   English

tflite-model-maker 自定义数据集 (pascal voc) 问题 - ValueError: The size of the train_data (0) could not be less than the batch_size (1)

[英]tflite-model-maker custom dataset (pascal voc) problem - ValueError: The size of the train_data (0) couldn't be smaller than the batch_size (1)

I've recently been trying to follow the tutorial here: https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_efficientdet_model_maker_tf2.ipynb#scrollTo=ZljJ25RAnj5x我最近一直在尝试按照此处的教程进行操作: https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_efficientdet_model_maker_tf2.ipynb#scrollTo=ZljJ25RAnj5x

However, I'm running this on a workstation, not in Google Colab.但是,我在工作站上运行它,而不是在 Google Colab 中。

I've reduced the code down to the bare minimum as shown here:我已将代码减少到最低限度,如下所示:

import numpy as np
import os
import random
import shutil

from tflite_model_maker.config import ExportFormat
from tflite_model_maker import model_spec
from tflite_model_maker import object_detector

import tensorflow as tf
assert tf.__version__.startswith('2')

tf.get_logger().setLevel('ERROR')
from absl import logging
logging.set_verbosity(logging.ERROR)
  
label_map = {1: 'TIE', 2: 'HOLE'} 

train_images_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/training/images/'
train_annotations_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/training/annotations/'
val_images_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/validation/images/'
val_annotations_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/validation/annotations/'
test_images_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/test/images/'
test_annotations_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/test/annotations/'

train_data = object_detector.DataLoader.from_pascal_voc(train_images_dir, train_annotations_dir, label_map=label_map)
validation_data = object_detector.DataLoader.from_pascal_voc(val_images_dir, val_annotations_dir, label_map=label_map)
test_data = object_detector.DataLoader.from_pascal_voc(test_images_dir, test_annotations_dir, label_map=label_map)

print(f'train count: {len(train_data)}')
print(f'validation count: {len(validation_data)}')
print(f'test count: {len(test_data)}')

spec = object_detector.EfficientDetLite1Spec()

model = object_detector.create(train_data=train_data, model_spec=spec, validation_data=validation_data, epochs=10000, batch_size=1, train_whole_model=True)

model.evaluate(test_data)

TFLITE_FILENAME = 'efficientdet-lite.tflite'
LABELS_FILENAME = 'labels.txt'

model.export(export_dir='.', tflite_filename=TFLITE_FILENAME, label_filename=LABELS_FILENAME,export_format=[ExportFormat.TFLITE, ExportFormat.LABEL])
             
model.evaluate_tflite(TFLITE_FILENAME, test_data)

My image directories are full of only images, and I've ensured that they are properly formatted JPG files.我的图像目录中只有图像,并且我确保它们是格式正确的 JPG 文件。

Likewise, my annotations directories are pascal voc format XML files.同样,我的注释目录是 pascal voc 格式的 XML 文件。 I used labelImg to make the annotations.我使用labelImg进行注释。

Running the above code produces the following error:运行上面的代码会产生以下错误:

(venv) dev@trainingpc:~/Tensorflow/workspace/tpu_vision$ python3 train.py 
2021-11-18 16:02:07.008550: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
train count: 0
validation count: 0
test count: 0
2021-11-18 16:02:08.950882: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-11-18 16:02:08.983851: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:08.984135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:2c:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2021-11-18 16:02:08.984189: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:08.984437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties: 
pciBusID: 0000:41:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2021-11-18 16:02:08.984452: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-11-18 16:02:08.986983: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-11-18 16:02:08.987017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-11-18 16:02:09.009829: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-11-18 16:02:09.009989: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-11-18 16:02:09.010322: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-11-18 16:02:09.010867: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-11-18 16:02:09.010951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-11-18 16:02:09.011082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.011414: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.011757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.012031: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.012257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-11-18 16:02:09.012704: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-18 16:02:09.171706: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.171957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:2c:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2021-11-18 16:02:09.172011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.172207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties: 
pciBusID: 0000:41:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2021-11-18 16:02:09.172259: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.172484: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.172706: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.172929: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.173118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-11-18 16:02:09.173152: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-11-18 16:02:09.668270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-18 16:02:09.668312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 1 
2021-11-18 16:02:09.668317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N Y 
2021-11-18 16:02:09.668321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1:   Y N 
2021-11-18 16:02:09.668600: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.668881: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.669121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.669358: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.669588: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.669816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 46718 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:2c:00.0, compute capability: 8.6)
2021-11-18 16:02:09.670168: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 16:02:09.670374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 46101 MB memory) -> physical GPU (device: 1, name: NVIDIA RTX A6000, pci bus id: 0000:41:00.0, compute capability: 8.6)
Traceback (most recent call last):
  File "train.py", line 36, in <module>
    model = object_detector.create(train_data=train_data, model_spec=spec, validation_data=validation_data, epochs=10000, batch_size=1, train_whole_model=True)
  File "/home/dev/Tensorflow/workspace/tpu_vision/venv/lib/python3.8/site-packages/tensorflow_examples/lite/model_maker/core/task/object_detector.py", line 287, in create
    object_detector.train(train_data, validation_data, epochs, batch_size)
  File "/home/dev/Tensorflow/workspace/tpu_vision/venv/lib/python3.8/site-packages/tensorflow_examples/lite/model_maker/core/task/object_detector.py", line 139, in train
    raise ValueError('The size of the train_data (%d) couldn\'t be smaller '
ValueError: The size of the train_data (0) couldn't be smaller than batch_size (1). To solve this problem, set the batch_size smaller or increase the size of the train_data.

If I comment out everything below the print statements in the code, we get the following:如果我注释掉代码中打印语句下方的所有内容,我们会得到以下信息:

(venv) dev@trainingpc:~/Tensorflow/workspace/tpu_vision$ python3 train.py 
2021-11-18 15:46:17.698405: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
train count: 0
validation count: 0
test count: 0

So to me, it looks like maybe the Dataloader functions are not working as intended, but I have tried my hardest to make sure that my data is all correct.所以对我来说,看起来 Dataloader 函数可能没有按预期工作,但我已经尽我所能确保我的数据都是正确的。 I'm really struggling to find what could be wrong here.我真的很难找到这里可能出了什么问题。 Any advice would be well appreciated.任何建议将不胜感激。

I have the same problem and my solution is that the dir paths are wrong.我有同样的问题,我的解决方案是目录路径错误。 Take

test_data = object_detector.DataLoader.from_pascal_voc("C:\project\study\AI\face_detect\test_set\image_dir","C:\project\study\AI\face_detect\test_set\Annotations",['licheng','chengjin','lipeng'])

as example, and size of test_data is 0 until I change to例如, test_data的大小为0 ,直到我更改为

test_data = object_detector.DataLoader.from_pascal_voc("C:\\project\\study\AI\\face_detect\\test_set\\image_dir","C:\\project\\study\\AI\\face_detect\\test_set\\Annotations",['licheng','chengjin','lipeng'])

I don't know if it's useful for you.不知道对你有没有用。

OP is using Linux and it seems that the DataLoader.from_pascal_voc() function requires the image and annotations directory string without trailing slashes on Linux. OP 正在使用 Linux 并且似乎DataLoader.from_pascal_voc() function 需要图像和注释目录字符串,而 ZEDC9F0A5A5D577438377.37361 For example:例如:

train_images_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/training/images'
train_annotations_dir = '/home/dev/Tensorflow/workspace/tpu_vision/tpu_images/training/annotations'

Now when you call from_pascal_voc(), you should have data:现在当你调用 from_pascal_voc() 时,你应该有数据:

train_data = object_detector.DataLoader.from_pascal_voc(train_images_dir, train_annotations_dir, label_map=label_map)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 train_data (1) 的大小不能小于 batch_size (32) - The size of the train_data (1) couldn't be smaller than batch_size (32) mac 上安装 tflite-model-maker 失败 - Installation of tflite-model-maker on mac fails 发出以Pascal VOC格式为SSD模型创建自定义数据集的问题 - Issue creating custom dataset in Pascal VOC format for SSD model train_data.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat() 做什么? - What does train_data.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat() do? 为什么训练 NumPy 数组大小小于整个数据集? - Why train NumPy array size is smaller than the whole dataset? 当batch_size与数据量不匹配时,Keras自定义生成器 - Keras custom generator when batch_size doesn't match with amount of data batch_size 与数据大小的关系 - Relationship between batch_size and data size model.evaluate()中batch_size的含义 - Meaning of batch_size in model.evaluate() Pytorch 张量错误:ValueError:预期输入 batch_size (1) 以匹配目标 batch_size (2) - Pytorch Tensor Error: ValueError: Expected input batch_size (1) to match target batch_size (2) PyToch:ValueError:预期输入batch_size(256)匹配目标batch_size(128) - PyToch: ValueError: Expected input batch_size (256) to match target batch_size (128)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM