简体   繁体   中英

Is there any benefit to using python2.7 multiprocessing to copy files

I would like to know if there is any benefit to using python2.7's multiprocessing module to asynchronously copy files from one folder to another.

Is diskio always forced to be in serial? Does this change if you are copying from one hard disk to a different hard disk? Does this change depending on operating system (windows / linux)?

Perhaps it is possible to read in parallel, but not possible to write?

This is all assuming that the fiels being moved/copied are different files going to different locations.

I/O goes to the system cache in RAM before hitting a hard drive. Fro writes, you may find the copies are fast until you exhaust RAM and then slows down and that multiple reads of the same data are fast. If you copy the same file to several places, there is an advantage to do the copies of that file before moving to the next.

I/O to a single hard drive (or group of hard drives joined with a RAID or volume manager) is mostly serial except that the operating system and drive may reorder operations to read / write nearby tracks before seeking for tracks that are further away. There is some advantage to doing parallel copies because there are more opportunities to reorder, but since you are really writing from the system RAM cache sometime after your application writes, the benefits may be hard to measure.

There is a greater benefit moving between drives. Those go mostly in parallel, although there is some contention for the buses (eg, pcie, sata) that run the drives.

If you have a lot of files to copy, multiprocessing is a reasonable way to go, but you may find that subprocess to the native copy utilities is faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM