简体   繁体   中英

python multiprocess read data from disk

it confused me long time. my program has two process, both read data from disk, disk max read speed 10M/s
1. if two process both read 10M data, is two process spend time same with one process read twice?
2. if two process both read 5M data, two process read data spend 1s, one process read twice spend 1s, i know multi process can save time from IO, but the spend same time in IO, multi process how to save time?

It's not possible to increase disk read speed by adding more threads. With 2 threads reading you will get at best 1/2 the speed per thread (in practice even less), with 3 threads - 1/3 the speed, etc.

With disk I/O it is the difference between sequential and random access speed that is really important. For example, sequential read speed can be 10 MB/s, and random read just 10 KB/s. This is the case even with the latest SSD drives (although the ratio may be less pronounced).

For that reason you should prefer to read from disk sequentially from only one thread at a time. Reading the file in 2 threads in parallel will not only reduce the speed of each read by half, but will further reduce because of non-sequential (interleaved) disk access.


Note however, that 10 MB is really not much; modern OSes will prefetch the entire file into the cache, and any subsequent reads will appear instantaneous.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM