简体   繁体   中英

Fast shared object containing strings for python multiprocessing

I have created some code that runs in parallel using python multiprocessing library. Schema is very simple: multiple workers producing strings, one worker consuming and analyzing strings.

Currently I am using multiprocessing queue, that stores strings, created as follows:

manager = multi.Manager() 
queue = manager.Queue(20)

and data are put/withdrawn this way:

queue.put(string)
queue.get(timeout = 5)

Some profiling and also observation in htop led me to conclusion, that those operations are very CPU demanding.

My question is than, is there a better(faster) way how share some storage of strings in described scheme?

Note that I don't really care, whether the type of storage is FIFO.. But it would be better for me.

The operations are demanding because they have to do a lot of locking and communications between processes(remember that Manager spawns a new process used to keep shared objects).

To avoid this kind of overhead you should reduce the amount of communications to the minimum. For example, instead of sending one string at a time, send a sequence of N strings(The number of strings per payload depends on the application, and you'll have to test a bit which one works best). The worker thread can get this sequences, process all the strings and put all the results together in the output queue.

From a simple micro benchmark it seems like the time taken to put() an object is at least 0.1 milliseconds. If the object is complex then the time can go up to some milliseconds. If you are sending very small strings then the time taken to process them might be around some micro seconds, or around some tens of micro seconds.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM