简体   繁体   English

Python多重处理:锁定适用于(大型)磁盘写入吗?

[英]Python Multiprocessing: is locking appropriate for (large) disk writes?

I have multiprocessing code wherein each process does a disk write (pickling data), and the resulting pickle files can be upwards of 50 MB (and sometimes even more than 1 GB depending on what I'm doing). 我有多处理代码,其中每个进程都进行磁盘写入(腌制数据),并且生成的腌制文件可能会超过50 MB(有时甚至超过1 GB,具体取决于我在做什么)。 Also, different processes are not writing to the same file, each process writes a separate file (or set of files). 同样,不同的进程不会写入同一文件,每个进程都写入一个单独的文件(或一组文件)。

Would it be a good idea to implement a lock around disk writes so that only one process is writing to the disk at a time? 围绕磁盘写实现锁是一个好主意,以便一次只向磁盘写一个进程吗? Or would it be best to just let the operating system sort it out even if that means 4 processes may be trying to write 1 GB to the disk at the same time? 还是最好让操作系统对其进行分类,即使这意味着4个进程可能试图同时向磁盘写入1 GB的数据?

As long as the processes aren't fighting over the same file; 只要进程不争用同一个文件; let the OS sort it out. 让操作系统对其进行整理。 That's its job. 那是它的工作。

Unless your processes try and dump their data in one big write, the OS is in a better position to schedule disk writes. 除非您的进程尝试通过一次大写操作转储其数据,否则操作系统将更好地安排磁盘写操作。 If you do use one big write, you mighy try and partition it in smaller chunks. 如果您确实使用了一次大写操作,则应尝试将其分成较小的块。 That might give the OS a better chance of handling them. 这可能会使操作系统有更好的机会来处理它们。

Of course you will hit a limit somewhere. 当然,您会在某个地方达到极限。 Your program might be the CPU-bound, memory-bound or disk-bound. 您的程序可能是CPU绑定的,内存绑定的或磁盘绑定的。 It might hit different limits depending on the input or load. 根据输入或负载,它可能会达到不同的限制。 But unless you've got evidence that you're constantly disk-bound and you've got a good idea how to solve that, I'd say don't bother. 但是,除非你有证据表明你经常磁盘绑定的你已经有了一个好主意,如何解决这个问题,我会说不要打扰。 Because the days that a write system call actuall meant that the data was directly sent to disk are long gone. 因为write系统调用实际上意味着将数据直接发送到磁盘的日子已经过去了。

Most operating systems these days use unallocated RAM as a disk cache. 如今,大多数操作系统使用未分配的RAM作为磁盘缓存。 And HDD's have built-in caches as well. HDD也具有内置缓存。 Unless you disable both of these (which will give you a huge performance hit) there is precious little connection between your program completing a write and and the data actually hitting the plates or flash. 除非同时禁用这两个功能(这将给您带来巨大的性能损失),否则在完成write的程序与实际击中印版或闪存的数据之间几乎没有什么联系

You might consider using memmap (if your OS supports it), and let the OS's virtual memory do the work for you. 您可能会考虑使用memmap (如果您的操作系统支持的话),然后让操作系统的虚拟内存为您完成工作。 See eg the architect notes for the Varnish cache . 请参阅例如有关Varnish缓存架构师说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM