简体   繁体   English

多处理:比cpu.count更多的进程

[英]Multiprocessing : More processes than cpu.count

Note : I "forayed" into the land of multiprocessing 2 days ago. 注意 :2天前我“进入” multiprocessing So my understanding is very basic. 所以我的理解非常基础。

I am writing and application for uploads to amazon s3 buckets. 我正在编写和申请上传到amazon s3水桶。 In case the file size is larger( 100mb ), Ive implemented parallel uploads using pool from the multiprocessing module. 如果文件大小较大( 100mb ),我已经使用multiprocessing模块中的pool实现了并行上传。 I am using a machine with core i7 , i had a cpu_count of 8 . 我使用的是core i7的机器,我的cpu_count8 I was under the impression that if i do pool = Pool(process = 6) I use 6 cores and the file begins to upload in parts and the uploads for the first 6 parts begins simultaneously. 我的印象是,如果我做pool = Pool(process = 6)我使用6核心,文件开始上传部分,前6个部分的上传同时开始。 To see what happens when the process is greater than the cpu_count , i entered 20 (implying that i want to use 20 cores). 要查看当process大于cpu_count时会发生什么,我输入20(意味着我想使用20个核心)。 To my surprise instead of getting a block of errors the program began to upload 20 parts simultaneously (I used a smaller chunk size to make sure there are plenty of parts). 令我惊讶的是,程序开始同时上传20个零件而不是出现错误(我使用较小的chunk size以确保有大量零件)。 I dont understand this behavior. 我不明白这种行为。 I have only 8 cores, so how cant he program accept an input of 20? 我只有8核心,所以他的程序如何接受20的输入? When I say process=6 , does it actually use 6 threads?? 当我说process=6 ,它实际上是否使用了6个线程? Which can be the only explanation of 20 being a valid input as there can be 1000s of threads. 这可能是20作为有效输入的唯一解释,因为可以有1000个线程。 Can someone please explain this to me. 有人可以向我解释一下。

Edit: 编辑:

I 'borrowed' the code from here . 我从这里借了'代码'。 I have changed it only slightly wherein I ask the user for a core usage for his choice instead of setting parallel_processes to 4 我稍微改变了一下,我要求用户选择核心用法,而不是将parallel_processes设置为4

The number of processes running concurrently on your computer is not limited by the number of cores. 在您的计算机上并发运行的进程数不受核心数量的限制。 In fact you probably have hundreds of programs running right now on your computer - each with its own process. 实际上,您现在可能在计算机上运行了数百个程序 - 每个程序都有自己的进程。 To make it work the OS assigns one of your 8 processors to each process or thread only temporarily - at some point it may get stopped and another process will take its place. 为了使其工作,操作系统只将一个8个处理器暂时分配给每个进程或线程 - 在某些时候它可能会被停止而另一个进程将取代它。 See What is the difference between concurrent programming and parallel programming? 请参阅并发编程和并行编程之间的区别是什么? if you want to find out more. 如果你想了解更多。

Edit: Assigning more processes in your uploading example may or may not make sense. 编辑:在上传示例中分配更多进程可能有意义,也可能没有意义。 Reading from disk and sending over the network is normally a blocking operation in python. 从磁盘读取和通过网络发送通常是python中的阻塞操作。 A process that waits for its chunk of data to be read or sent can be halted so that another process may start its IO. 可以暂停等待读取或发送其数据块的进程,以便另一个进程可以启动其IO。 On the other hand, with too many processes either file I/O or network I/O will become a bottleneck and your program will slow down because of the additional overhead needed for process switching. 另一方面,如果进程太多,文件I / O或网络I / O将成为瓶颈,并且由于进程切换所需的额外开销,您的程序将变慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM