I have application in PHP on cluster server . It copy file from aws bucket on server process the file (unzip file. convert PDF to XML using itext java, Read XML and save data to database) and the upload processed file back to bucket.
It works fine for single instance but in load balancing for multiple instances file under process on server disappears. I can not process file directly from bucket as I can not unzip it on bucket also can not run jar file on bucket. So I have to store file temporary for processing. Is there any way to handle this situation
A few possible solutions:
There can be multiple solutions to this:
One solution is to check and apply tags if the file is processed at the time of upload apply some tag like processed=true
and when you are downloading files check for tags.
Better solution is to use lambda for this task.
You can use the pattern of
Or just have lambda do all the work on S3 upload. Depending on how long the process runs. Execution time is 5 mins. http://docs.aws.amazon.com/lambda/latest/dg/limits.html
For example:
Set up a lambda function to monitor the s3 on upload new object event. Then have the lambda function drop a message in SQS(From the event data it receives, the Lambda function knows the source bucket name and object key name). The server can monitor the queue, process the message, extract the file and upload it to a new bucket, delete the file from the old s3 bucket and then delete the message from the queue. If the server dies during processing, the message goes back onto the queue(visibility timeout). A way to ensure it is processed and deleted on the old bucket is to enable versioning and a life cycle policy. When processing the message if the files doesn't exist on the old bucket send an alert and/or check for the previous version. You can also have a life cycle policy on the old bucket to permanently delete version if they are older than X days.
Monitoring S3 with Lambda
http://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
s3 Versioning
http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html
Select Permanently delete previous versions and then enter the number of days after an object becomes a previous version to permanently delete the object (for example, 455 days). http://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html
What you need is a system which will store the file without losses. There are many alternatives for that.
a) Another server
b) An SQS Queue. @strongiz answer above explains it very well.
c) Even another database.
In each of these cases, you need a flag which will define if file is processed or not. when file processing is complete
a) delete the file or,
b) Change the flag
Since, PHP is session oriented, you cant store data there permannently, so, you need to connect to another interface. In case of a database, You can actually store a the file path entry and a flag to determine if file is processed or not. So, even a combo of the 3 might also work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.