简体   繁体   中英

VOCBboxDataset returns incorrect dataset size when applied to my dataset

I have a 250 image dataset and 250 annotation files with two classes: ball and player. The folder also has three text files train.txt, val.txt, test.txt containing lists of training,testing and validation images respectively.

bball_labels = ('ball','player')
class BBall_dataset(VOCBboxDataset):
  def _get_annotations(self, i):
    id_ = self.ids[i]
    anno = ET.parse(os.path.join(self.data_dir, 'Annotations', id_ + 
'.xml'))
    bbox = []
    label = []
    difficult = []
    for obj in anno.findall('object'):
      bndbox_anno = obj.find('bndbox')
      bbox.append([int(bndbox_anno.find(tag).text) - 1 for tag in ('ymin', 
'xmin', 'ymax', 'xmax')])
      name = obj.find('name').text.lower().strip()
      label.append(bball_labels.index(name))
    bbox = np.stack(bbox).astype(np.float32)
    label = np.stack(label).astype(np.int32)
    difficult = np.array(difficult, dtype=np.bool)
    return bbox, label, difficult

Out of 250 I have put 170 as train, 70 as val and 10 as test images. But while printing length of train,val and test dataset I always get train+12, train+3 nad test. Eg in this case it will show as 182,73,10 for train, val and test. Test value does not change. Everytime train and val value will increase by 12 and 3.

valid_dataset = BBall_dataset('BasketballDataset', 'val')
test_dataset = BBall_dataset('BasketballDataset', 'test')
train_dataset = BBall_dataset('BasketballDataset', 'train') 

print('Number of images in "train" dataset:', len(train_dataset))
print('Number of images in "valid" dataset:', len(valid_dataset))
print('Number of images in "test" dataset:', len(test_dataset))

Number of images in "train" dataset: 182 Number of images in "valid" dataset: 73 Number of images in "test" dataset: 10

Why does this happen and how to prevent this. And also does it in someway affect my training process?

train.txt link ( https://imgur.com/B1Gszfi ) val.txt link ( https://imgur.com/kOcIZ5h )

The issue was due to a small overlooked situation where the text files had gaps as the image list was cut, copied and pasted in the same file. The text files were created in notepad. In notepad index is not visible, but the gaps are visible once you view the text files in github where the initial indexing is still present and the indexing remains even though the list was cut down in size. Eg first a list of 182 images were created but later cut down to 170. So when we use the Dataset Creation object the code reads all the lines of the text file ie it will read 182 instead of 170. We need to make sure that the number of index and the number of image is the same to avoid this problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM