简体繁体中英

DataLoader num_workers vs torch.set_num_threads

原文 2019-06-05 15:11:49 6 1 python/ machine-learning/ pytorch

Is there a difference between the parallelization that takes place between these two options? I'm assuming num_workers is solely concerned with the parallelizing the data loading. But is setting torch.set_num_threads for training in general? Trying to understand the difference between these options. Thanks!

1 answers

The num_workers for the DataLoader specifies how many parallel workers to use to load the data and run all the transformations. If you are loading large images or have expensive transformations then you can be in situation where GPU is fast to process your data and your DataLoader is too slow to continuously feed the GPU. In that case setting higher number of workers helps. I typically increase this number until my epoch step is fast enough. Also, a side tip: if you are using docker, usually you want to set shm to 1X to 2X number of workers in GB for large dataset like ImageNet.

The torch.set_num_threads specifies how many threads to use for parallelizing CPU-bound tensor operations. If you are using GPU for most of your tensor operations then this setting doesn't matter too much. However, if you have tensors that you keep on cpu and you are doing lot of operations on them then you might benefit from setting this. Pytorch docs, unfortunately, don't specify which operations will benefit from this so see your CPU utilization and adjust this number until you can max it out.

Understanding Dataloader and how to speed up GPU training with num_workers

PyTorch next(iter(training_loader)) extremely slow, simple data, can't num_workers?

Why got an unexpected keyword argument 'num_workers' with TypeError: __init__()?

ZeroDivisionError in the eval function for hub.compute when kept a default value of 1 for the num_workers parameter

“num - 1” vs “num -= 1”

Why is torch.get_num_threads returning 1 despite setting it to NUM_THREADS =12

Using mkl_set_num_threads with numpy

module 'torch' has no attribute 'nan_to_num'

Threads vs Workers in CherryPy

Use of OMP_NUM_THREADS=1 for Python Multiprocessing

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Understanding Dataloader and how to speed up GPU training with num_workers PyTorch next(iter(training_loader)) extremely slow, simple data, can't num_workers? Why got an unexpected keyword argument 'num_workers' with TypeError: __init__()? ZeroDivisionError in the eval function for hub.compute when kept a default value of 1 for the num_workers parameter “num - 1” vs “num -= 1” Why is torch.get_num_threads returning 1 despite setting it to NUM_THREADS =12 Using mkl_set_num_threads with numpy module 'torch' has no attribute 'nan_to_num' Threads vs Workers in CherryPy Use of OMP_NUM_THREADS=1 for Python Multiprocessing

Related Tags

DataLoader num_workers vs torch.set_num_threads

Question

1 answers

solution1 1 ACCPTED 2020-05-05 22:09:47

solution1
1 ACCPTED 2020-05-05 22:09:47