简体   繁体   中英

PHP Curl Performance Bottleneck Making Google Maps Geocoding Requests

I am using PHP and CURL to make HTTP reverse geocoding (lat, long -> address) requests to Google Maps. I have a premier account, so we can make a lot of a requests without being throttled or blocked.

Unfortunately, I have reached a performance limit. We get approximately 500,000 requests daily that need to be reverse geocoded.

The code is quite trivial (I will write pieces in pseudo-code) for the sake of saving time and space. The following code fragment is called every 15 seconds via a job.

<?php
    //get requests from database
    $requests = get_requests();

    foreach($requests as $request) {
        //build up the url string to send to google
        $url = build_url_string($request->latitude, $request->longitude);

        //make the curl request
        $response = Curl::get($url);

        //write the response address back to the database
        write_response($response);
     }

     class Curl {
          public static function get($p_url, $p_timeout = 5) {
               $curl_handle = curl_init();
               curl_setopt($curl_handle, CURLOPT_URL, $p_url);
               curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, $p_timeout);
               curl_setopt($curl_handle, CURLOPT_TIMEOUT, $p_timeout);
               curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);

               $response = curl_exec($curl_handle);
               curl_close($curl_handle);

               return $response;
          }
     }
?>

The performance problem seems to be the CURL requests. They are extremely slow, probably because its making a full HTTP request every operations. We have a 100mbps connection, but the script running at full speed is only utilizing about 1mbps. The load on the server is essentially nothing. The server is a quad core, with 8GB of memory.

What things can we do to increase the throughput of this? Is there a way to open a persistent (keep-alive) HTTP request with Google Maps? How about exploding the work out horizontally, ie making 50 concurrent requests?

Thanks.

some things I would do:

  • no matter how "premium" you are, doing external http-requests will always be a bottleneck, so for starters, cache request+response - you can still update them via cron on a regular basis

  • these are single http requests - you will never get "fullspeed" with them especially if request and response are that small (< 1MB) - tcp/handshaking/headers/etc. so try using multicurl (if your premium allows it) in order to start multiple requests - this should give you fullspeed ;)

  • add "Connection: close" in the request header you send, this will immediately close the http connection so your and google's server won't get hammered with halfopen

Considering you are running all your requests sequentially you should look into dividing the work up onto multiple machines or processes. Then each can be run in parallel. Judging by your benchmarks you are limited by how fast each Curl response takes, not by CPU or bandwidth.

My first guess is too look at a queuing system ( Gearman , RabbitMQ ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM