简体   繁体   中英

Can I share a large array in memory between PHP processes?

I use PHP to do a lot of data processing ( realizing I'm probably pushing into territories where I should be using other languages and/or techniques ).

I'm doing entity extraction with a PHP process that loads an array containing ngrams to look for into memory. That array uses 3GB of memory and takes about 20 seconds to load each time I launch a process. I generate it once locally on the machine and each process loads it from a .json file. Each process then tokenizes the text it's processing and does an array_intersect between these two arrays to extract entities.

Is there any way to preload this into memory on the machine that is running all these processes and then share the resource across all the processes?

Since it's probably not possible with PHP: What type of languages/methods should I be researching to do this sort of entity extraction more efficiently?

If the array never gets modified after it's loaded, then you could use pcntl_fork() and fork off a bunch of copies of the script. With copy-on-write semantics, they'd all be reading from the exact same memory copy of the array.

However, as soon as the array gets modified, then you'll pay a huge penalty as the array gets copied into each forked child's memory space. This would be especially true if any of the scripts finish their run early - they'd shut down, that PHP process starts shutdown cleanup, and that'd count as a write on the array's memory space, causing the copying.

In your case, the best way of sharing might be read only mmap access .

I don't know if this is possible in PHP. A lot of languages will allow you to mmap a file into memory - and your operating system will be smart enough to realize that read-only maps can be shared. Also, if you don't need all of it, the operating system can reclaim the memory, and load it again from disk as necessary. In fact, it may even allow you to map more memory than you physically have.

mmap is really elegant. But nevertheless, dealing with such mapped data in PHP will likely be a pain, and sloooow. In general PHP is slow. In benchmarks, it is common to see PHP come in at 40-50 times the runtime of a good C program. This is much worse than eg Java, where a good Java program is only twice as slow as a highly optimized C; there it may pay off to have the powerful development tools of Java as opposed to having to debug low-level C code. But PHP does not have any key benefit: it is neither elegant to write, nor does it have a superior toolchain, nor it is fast...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM