简体   繁体   English

DataLoader num_workers vs torch.set_num_threads

[英]DataLoader num_workers vs torch.set_num_threads

Is there a difference between the parallelization that takes place between these two options?这两个选项之间发生的并行化之间有区别吗? I'm assuming num_workers is solely concerned with the parallelizing the data loading.我假设num_workers只关心数据加载的并行化。 But is setting torch.set_num_threads for training in general?但是一般来说是否设置torch.set_num_threads用于训练? Trying to understand the difference between these options.试图了解这些选项之间的区别。 Thanks!谢谢!

The num_workers for the DataLoader specifies how many parallel workers to use to load the data and run all the transformations. DataLoader 的num_workers指定用于加载数据和运行所有转换的并行工作线程数。 If you are loading large images or have expensive transformations then you can be in situation where GPU is fast to process your data and your DataLoader is too slow to continuously feed the GPU.如果您正在加载大型图像或进行昂贵的转换,那么您可能会遇到 GPU 处理数据速度快而 DataLoader 太慢而无法连续提供给 GPU 的情况。 In that case setting higher number of workers helps.在这种情况下,设置更多的工人会有所帮助。 I typically increase this number until my epoch step is fast enough.我通常会增加这个数字,直到我的 epoch 步骤足够快。 Also, a side tip: if you are using docker, usually you want to set shm to 1X to 2X number of workers in GB for large dataset like ImageNet.另外,一个小提示:如果您使用 docker,通常您希望将 shm 设置为 1X 到 2X 的 GB 工人数,以用于 ImageNet 等大型数据集。

The torch.set_num_threads specifies how many threads to use for parallelizing CPU-bound tensor operations. torch.set_num_threads指定用于并行化 CPU 绑定张量操作的线程数。 If you are using GPU for most of your tensor operations then this setting doesn't matter too much.如果您在大多数张量运算中使用 GPU,那么此设置就没有太大关系。 However, if you have tensors that you keep on cpu and you are doing lot of operations on them then you might benefit from setting this.但是,如果您有保持在 cpu 上的张量并且您正在对它们进行大量操作,那么您可能会从设置中受益。 Pytorch docs, unfortunately, don't specify which operations will benefit from this so see your CPU utilization and adjust this number until you can max it out.不幸的是,Pytorch 文档没有指定哪些操作将从中受益,因此请查看您的 CPU 利用率并调整此数字,直到您可以将其最大化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 了解 Dataloader 以及如何使用 num_workers 加速 GPU 培训 - Understanding Dataloader and how to speed up GPU training with num_workers PyTorch接下来(iter(training_loader))极慢,简单的数据,不能num_workers? - PyTorch next(iter(training_loader)) extremely slow, simple data, can't num_workers? 为什么使用 TypeError: __init__() 得到一个意外的关键字参数“num_workers”? - Why got an unexpected keyword argument 'num_workers' with TypeError: __init__()? 当 num_workers 参数保持默认值 1 时,hub.compute 的 eval function 中的 ZeroDivisionError - ZeroDivisionError in the eval function for hub.compute when kept a default value of 1 for the num_workers parameter “数字-1”与“数字-= 1” - “num - 1” vs “num -= 1” 为什么 torch.get_num_threads 尽管将其设置为 NUM_THREADS =12 但仍返回 1 - Why is torch.get_num_threads returning 1 despite setting it to NUM_THREADS =12 将mkl_set_num_threads与numpy一起使用 - Using mkl_set_num_threads with numpy 模块“火炬”没有属性“nan_to_num” - module 'torch' has no attribute 'nan_to_num' CherryPy 中的线程与工人 - Threads vs Workers in CherryPy 使用 OMP_NUM_THREADS=1 进行 Python 多处理 - Use of OMP_NUM_THREADS=1 for Python Multiprocessing
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM