简体   繁体   中英

Process Multiple XML files at the same time in PHP

Hello im making a component in PHP that reads a atom file and get a list of xmls for process, i need to parse them and insert the data on the database.

For each type of XML (news, scores, schedules) i do something like this

  1. Get XML list to process
  2. insert XML URL on the database and put process state = 0
  3. Loop trought the list
  4. Open XML URL save to the disk
  5. Process
  6. Put file state = 1
  7. Go next

Thing is i got a lot of ram and cores on my machine, but the list keep growing and the pending files to process is allways bigger and bigger.

I want to know how can i do to process let´s say 10 files at the same time as i got ram and cores to process but if i process one at time pending list will just allways get bigger.

i appreciate some ideas and appologize for my english

You could try something like a divide and conquer in you step 4. Here is a simple implementation of parallel batch processing .

You may also try parallel curling . This PHP class providing an easy interface for running multiple concurrent CURL requests.

You're using the database as a queue. This normally is discouraged (there is software that does this better), and you're running into a typical problem with that in your example:

The process state field you've got is initialized with the value 0 . You then process each entry with the value 0 . Let's say processing an entry takes 10 minutes. And you insert one URL per minute. So you need to process 10 URLs in parallel to cope with the insertion rate. Let's play this through:

  • So in the first minute you insert the first URL and you start to process it. As the 10 processors take the first URL with the status 0 all 10 processors process the first URL.

  • In the second minute you insert the second URL and you still process ten times the first URL.

  • In the third minute you insert the third URL and you still process ten times the first URL.

And so on. You get the picture. The status is not manage properly. As you design the queue-system your own you need to take care that it works for parallel requirements. You should create a component for that and test it thoroughly with fake-data and logging so that you can track and verify it's operation. Then use such a system for the real thing. It might not do everything you want, but it should work much more robust.

Alternatively get a component for a queue that has been created already, has been tested and which has been work proven.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM