简体   繁体   中英

Why is my DataLoader so much slower than a for loop?

I am writing a neural-net based classifier for the MNIST dataset. I first tried loading the data manually using loops and indexes for the epochs and batches. In a tutorial I saw someone using the torch.utils.data.DataLoader for this exact task, so i changed my code to use a DataLoader instead. This resulted in major differences in the duration of the learning process.

I've tried to troubleshoot this issue by trying to narrow it down using benchmarks. I always benchmarked on both CPU(i7 8700k) and GPU(1080ti) and the data is stored on my ssd(970 evo).

I first tried to compare Batch Gradient Descent with and without DataLoader and then Mini-Batch Gradient Descent with and without DataLoader. The results were rather confusing to me.

|                 | BGD         | BGD with DL | MB-GD       | MB-GD with DL |
|-----------------|-------------|-------------|-------------|---------------|
| Time on CPU     | 00:00:56.70 | 00:05:59.31 | 00:01:31.29 | 00:07:46.56   |
| Accuracy on CPU | 82.47       | 33.44       | 94.84       | 87.67         |
| Time on GPU     | 00:00:15.89 | 00:05:41.79 | 00:00:17.48 | 00:05:37.33   |
| Accuracy on GPU | 82.3        | 30.66       | 94.88       | 87.74         |
| Batch Size      | M           | M           | 500         | 500           |
| Epoch           | 100         | 100         | 100         | 100           |

This is the code using DataLoader, stripped down to essentials.

num_epoch = 100
train_loader = DataLoader(batch_size=500, shuffle=False, dataset=dataset_train)

for epoch in range(num_epoch):
    for i, (images, labels) in enumerate(train_loader):
        images = images.view(-1, 28 * 28)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

vs the code using the loop

num_epoch = 100
batch_size = 500
num_batch = int(len(dataset_train) / batch_size)

for epoch in range(num_epoch):
    for batch_idx in range(num_batch):
        images = dataset_train.data[batch_idx*batch_size:(batch_idx+1)*batch_size].view(-1, 28 * 28)
        labels = dataset_train.targets[batch_idx*batch_size:(batch_idx+1)*batch_size]
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

I would expect the DataLoader to atleast perform somewhere close to the loop in terms of time and performance, but not 10 times slower. I am also confused why the DataLoader affects the model accuracy.

Am I using the DataLoader wrong, or is this just the wrong use case for it and a loop is better suited for what I am doing ?

EDIT: here are two fiddles containing the full code of the loop and the dataloader variant

EDIT: I believe I might have figured out how to fix my main problem, the performance difference between dataloader and loop. By setting the num_workers parameter of the loader to 8, i managed to drive down the time for mini-batch with DL on GPU to around 1 minute. While this is definitely better than 5 minutes, it's still bad, considering that minibatch with DL on GPU is on par with the performance of minibatch with loop on CPU.

transforms.ToTensor() takes a PIL Image or np.ndarray in the range [0, 255] as input and converts it to a torch.FloatTensor in the range [0.0, 1.0] if np.ndarray has dtype=np.uint8 or the PIL Image belong to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) docs

Rescaling and changing data type affect model accuracy. Also DataLoader is doing more operation than your loop over batches, therefore the difference in timings.

PS You should shuffle your training data when doing minibatch training

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM