简体   繁体   English

Pytorch 中的图像分类

[英]Image classification in Pytorch

I'm working on facenet-pytorch library in Pytorch, I want to know我正在研究 Pytorch 中的 facenet-pytorch库,我想知道

  1. the data augmentation should be in train dataset or test data set?数据增强应该在训练数据集还是测试数据集中?

  2. how many images should I put to test data set at least (I've used 2% of images in test data set)我至少应该在测试数据集中放置多少张图像(我在测试数据集中使用了 2% 的图像)

  3. I have 21 classes(21 persons face) and with (vggface2 dataset ) with evaluation mode, does it enough for training and test data set?我有 21 个班级(21 个人脸)和具有评估模式的(vggface2 数据集),它足以用于训练和测试数据集吗?

  4. how to visualize the images in test dataset to display if a face matched or not I tried this but it will rise this error:如何可视化测试数据集中的图像以显示人脸是否匹配我试过这个,但它会出现这个错误:

TypeError: Invalid shape (3, 160, 160) for image data TypeError:图像数据的形状无效(3、160、160)

The shape of images are: (10, 3, 160, 160)图像的形状为:(10, 3, 160, 160)

dataiter = iter(test_loader)
images, labels = dataiter.next()
# get predictions
preds = np.squeeze(net(images).data.max(1, keepdim=True)[1].numpy())
images = images.numpy()

# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
    ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
             color=("green" if preds[idx]==labels[idx] else "red"))
  1. how to take input faces from webcam after detected the face (prediction function)?检测到人脸后如何从网络摄像头获取输入人脸(预测功能)?
cap = cv.VideoCapture(0)
while True:
    ret, frame = cap.read()
    frame = cv.resize(frame, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
    image = predict_draw_bounding_box(frame)
    cv.imshow('Output', image)
    c = cv.waitKey(1)
    if c == 27:
        break
cap.release()
cv.destroyAllWindows()

But I don't know to implement predict_draw_bounding_box function?但是不知道实现predict_draw_bounding_box function?

Thanks for any advice感谢您的任何建议

That's a lot of questions;这是很多问题; you should probably split those up into multiple questions.您可能应该将它们分成多个问题。 In any case, I'll try answering some.无论如何,我会尝试回答一些。

  1. Data augmentation should generally be done on the train dataset only.数据增强通常应该只在训练数据集上完成。 Typical augmentations include random rotation, resized crops, horizontal flips, cutout etc. All of these only go on the train set.典型的增强包括随机旋转、调整大小的作物、水平翻转、剪切等。所有这些仅在训练集上的 go。

    Other than this, off the top of my head, I can only think of channel normalization as the only augmentation you usually apply to both training and testing set.除此之外,在我的脑海中,我只能将通道归一化视为您通常应用于训练和测试集的唯一增强。 You compute x-x_mean/sigma channelwise for all images in a dataset.您为数据集中的所有图像计算x-x_mean/sigma通道。

  2. The percentage of images in your test dataset is entirely empirical, and depends on how many images you actually have.测试数据集中图像的百分比完全是凭经验的,取决于您实际拥有的图像数量。 For very large datasets with a million plus images, small percentages like 2% is okay.对于具有一百万张以上图像的非常大的数据集,像 2% 这样的小百分比是可以的。 However if your number of images is in the ten thousands, thousands, or even less, it's good practice to keep around 20% as the test set.但是,如果您的图像数量在数万、数千甚至更少,最好将 20% 左右作为测试集。

  3. Can't understand your question.无法理解你的问题。

  4. Your images are in the shape (3, 160, 160).您的图像的形状为 (3, 160, 160)。 It's the channel-first syntax used by pytorch's nn.Module system, but plotting an RGB image in matplotlib requires it to have the channel in the last dimension, ie, (160,160,3).这是 pytorch 的 nn.Module 系统使用的通道优先语法,但是在 matplotlib 中绘制 RGB 图像需要它在最后一维中具有通道,即 (160,160,3)。 If images is a batch of images of shape (10,3,160,160), then do:如果images是一批形状为 (10,3,160,160) 的图像,则执行以下操作:

     ... images = images.numpy() images = images.swapaxes(1,2).swapaxes(2,3)...

    This will reshape it to (10,160,160,3), without harming the axes order.这会将其重塑为 (10,160,160,3),而不会损害轴顺序。

  5. No clue.没有线索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM