I am writing a neural-net based classifier for the MNIST dataset. I first tried loading the data manually using loops and indexes for the epochs and batches. In a tutorial I saw someone using the torch.utils.data.DataLoader for this exact task, so i changed my code to use a DataLoader instead. This resulted in major differences in the duration of the learning process.
I've tried to troubleshoot this issue by trying to narrow it down using benchmarks. I always benchmarked on both CPU(i7 8700k) and GPU(1080ti) and the data is stored on my ssd(970 evo).
I first tried to compare Batch Gradient Descent with and without DataLoader and then Mini-Batch Gradient Descent with and without DataLoader. The results were rather confusing to me.
| | BGD | BGD with DL | MB-GD | MB-GD with DL |
|-----------------|-------------|-------------|-------------|---------------|
| Time on CPU | 00:00:56.70 | 00:05:59.31 | 00:01:31.29 | 00:07:46.56 |
| Accuracy on CPU | 82.47 | 33.44 | 94.84 | 87.67 |
| Time on GPU | 00:00:15.89 | 00:05:41.79 | 00:00:17.48 | 00:05:37.33 |
| Accuracy on GPU | 82.3 | 30.66 | 94.88 | 87.74 |
| Batch Size | M | M | 500 | 500 |
| Epoch | 100 | 100 | 100 | 100 |
This is the code using DataLoader, stripped down to essentials.
num_epoch = 100
train_loader = DataLoader(batch_size=500, shuffle=False, dataset=dataset_train)
for epoch in range(num_epoch):
for i, (images, labels) in enumerate(train_loader):
images = images.view(-1, 28 * 28)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
vs the code using the loop
num_epoch = 100
batch_size = 500
num_batch = int(len(dataset_train) / batch_size)
for epoch in range(num_epoch):
for batch_idx in range(num_batch):
images = dataset_train.data[batch_idx*batch_size:(batch_idx+1)*batch_size].view(-1, 28 * 28)
labels = dataset_train.targets[batch_idx*batch_size:(batch_idx+1)*batch_size]
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
I would expect the DataLoader to atleast perform somewhere close to the loop in terms of time and performance, but not 10 times slower. I am also confused why the DataLoader affects the model accuracy.
Am I using the DataLoader wrong, or is this just the wrong use case for it and a loop is better suited for what I am doing ?
EDIT: here are two fiddles containing the full code of the loop and the dataloader variant
EDIT: I believe I might have figured out how to fix my main problem, the performance difference between dataloader and loop. By setting the num_workers
parameter of the loader to 8, i managed to drive down the time for mini-batch with DL on GPU to around 1 minute. While this is definitely better than 5 minutes, it's still bad, considering that minibatch with DL on GPU is on par with the performance of minibatch with loop on CPU.
transforms.ToTensor()
takes a PIL Image
or np.ndarray
in the range [0, 255]
as input and converts it to a torch.FloatTensor
in the range [0.0, 1.0]
if np.ndarray
has dtype=np.uint8
or the PIL Image
belong to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
docs
Rescaling and changing data type affect model accuracy. Also DataLoader
is doing more operation than your loop over batches, therefore the difference in timings.
PS You should shuffle your training data when doing minibatch training
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.