简体   繁体   中英

Training Deep Convnet with small input size

I am very new to this field of Deep Learning. While I understand how it works and I managed to run some tutorials on Caffe Library I still have some questions which I was unable to find some satisfying answers.

My questions are as follows:

  1. Consider AlexNet which takes 227 x 227 image size as input in caffe (I think in original paper its 224), and the FC7 produces as 4096-D feature vector. Now if I want to detect a person say using Sliding window of Size (32 x 64) then each of this window will be upsized to 227 x 227 before going through the AlexNet. This is some big computation. Is there a better way to handle this (32 x 64) window?

  2. My approach to this 32 x 64 window detector is to build my own network with few convolutions, pooling, ReLus and FCs. While I understand how I can build the architecture, I am afraid that the model I will train might have issues such as overfitting etc. One of my friend told me to pretrain my network using AlexNet but I don't know how to do this? I cannot get hold of him to ask for now but anyone there who thinks that what he said is doable? I am confused. I was thinking to use ImageNet and train my network which will take 32 x 64 input. Since this is just feature extractor I feel that using the imageNet might provide me with all variety of images for good learning? Please correct me if I am wrong and if possible guide me into the correct path.

  3. This question is just about Caffe. Say I compute feature using HOG and I want to use the GPU version of Neural Net to train a classifier. Is that possible? I want thinking to use HDF5 layer to read the hog feature vector and pass that fully connected layer for training? Is that possible?

I would appreciate any help or links to papers etc that may help me understand the idea of Convnets.

  1. For a CNN which contains fully connected layers, the input size cannot be changed. If the network is trained on 224x224 images, then the input size has to be 224x224. Look at this question .

  2. Training your own network from scratch will require huge amount of data. AlexNet was trained on a million images. If you have such high amount of training data (you can download the ImageNet training data) then go ahead. Otherwise you might want to look into finetuning .

  3. Yes, you can use HDF5 layer to read the HOG feature vector for training.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM