简体   繁体   中英

Optimizing Bulk Uploads to Rackspace Cloud Files via PHP

We have an application that parses data from external sources and localizes it back, saving and resizing images as the final step of the process. Given the size of our processing [2 million images to date] we've been using Rackspace Files for hosting the data...

require('/var/libs/rackspace/cloudfiles.php');
$auth = new CF_Authentication('xxx', 'yyyy');
$auth->authenticate();
$conn = new CF_Connection($auth,true);
$container = $conn->get_container('some container');

foreach ($lotsofitems as $onitem){

    // check the record
    // save the image to disk with cURL
    // resize it into 4 more versions
    // post it to rackspace

    if(file_exists('/var/temp/'. $image_id . '_full'. $image_type)){
        $object = $container->create_object($image_id . '_full' . $image_type);
        $object->load_from_filename('/var/temp/'. $image_id . '_full' . $image_type);
        unlink('/var/temp/'. $image_id . '_full' . $image_type); // remove the temp save
    }

    if(file_exists('/var/temp/'. $image_id . '_big'. $image_type)){
        $object = $container->create_object($image_id . '_big' . $image_type);
        $object->load_from_filename('/var/temp/'. $image_id . '_big' . $image_type);
        unlink('/var/temp/'. $image_id . '_big' . $image_type); // remove the temp save
    }

    if(file_exists('/var/temp/'. $image_id . '_med'. $image_type)){
        $object = $container->create_object($image_id . '_med' . $image_type);
        $object->load_from_filename('/var/temp/'. $image_id . '_med' . $image_type);
        unlink('/var/temp/'. $image_id . '_med' . $image_type); // remove the temp save
    }

    // delete the original
    // repeat

}

After optimizing our parser, GD, etc, we've benchmarked the process and processing the image takes about 1 second but transferring the 5 image variations to Rackspace's is taking 2-5 seconds per item and at times spikes up to 10+.

  • get image: 1341964436
  • got image: 1341964436
  • resized image: 1341964437
  • clouded one image: 1341964446
  • clouded image: 1341964448
  • finished with image: 1341964448

Some additional points:

  1. Our processing servers are on Rackspace's cloud as well.
  2. There are 5 total image versions ranging from around 30kb to 2kb
  3. All images are saved to disk before the transfer and removed after
  4. Our containers [we use several overall but one per item] are CDN enabled

Does anyone have suggestions with bulk transfers to Rackspace? Should we be reconnecting after a certain duration / number of requests? Optimizing our connection some other way? Or is it just about forking the processes and running a lot of calls.

Have you tried using CloudFuse? It allows you to mount Rackspace CloudFiles buckets as mounts.

I have used this and it's pretty good - they guy who made it works for Rackspace.

http://sandeepsidhu.wordpress.com/2011/03/07/mounting-cloud-files-using-cloudfuse-into-ubuntu-10-10-v2/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM