简体   繁体   中英

PHP cURL multi handling causing random connection issues between servers?

I have a website that tracks individual player's data for an online game. Everyday at the same time a cron is run that uses cURL to fetch each player's data from the game company's server (each player requires their own page to fetch). Previously I was looping through each player and creating their own cURL request at a time and storing the data - While this was a slow process, everything was working fine for weeks (doing anywhere from 500-1,000 players everyday).

As we gained more players the cron started to take too long to run so I rewrote it using ParallelCurl (cURL multi handling) about a week ago. It was set to open no more than 10 connections at a time and was running perfectly - doing about 3,000 pages in 3-4 minutes. I never noticed anything wrong until a day or two later I was randomly unable to connect to their servers (returning http code of 0). I thought I was permanently banned/blocked until about 1-2 hours later I could suddenly connect again. The block occurred several hours after the cron had run for the day - the only requests that were being made at the time were the occasional single file requests (that have been working fine and left untouched for months).

The past few days have all been like this. Cron runs fine, then sometime later (a few hours) I can't get a connection for an hour or two. Today I updated the cron to only open 5 connections at a time - everything worked fine until 5-6 hours later I couldn't connect for 2 hours.

I've done a ton of googling and can't seem to find anything useful. I'd guess that possibly a firewall is blocking my connection, but I'm really in over my head when it comes to anything like that. I am really clueless as to what is happening, and what I need to do to fix it. I'd be grateful for any help - even a guess or a just point in the right direction.

Note that I'm using a shared web host (HostGator). 2 days ago I submitted a ticket and made a post on their forums, I also sent an e-mail to the company and have yet to see a single reply from anything.

--EDIT--

Here's my code to run the multiple requests using parallelcurl. The include has been left untouched and is the same as shown here

set_time_limit(0);

require('path/to/parallelcurl.php');

$plyrs = array();//normally an array of all the players i need to update

function on_request_done($content, $url, $ch, $player) {
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);    
    if ($httpcode !== 200) {
        echo 'Could Not Find '.$player.'<br />';
        return;
    } else {//player was found, store in db
        echo 'Updated '.$player.'<br />';
    }
}

$max_requests = 5;

$curl_options = array(
    CURLOPT_SSL_VERIFYPEER => FALSE,
    CURLOPT_SSL_VERIFYHOST => FALSE,
    CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9',
);

$parallel_curl = new ParallelCurl($max_requests, $curl_options);

foreach ($plyrs as $p) {
    $search_url = "http://website.com/".urlencode($p);
    $parallel_curl->startRequest($search_url, 'on_request_done', $p);
usleep(300);//now that i think about it, does this actually do anything worthwhile positioned here?
}

$parallel_curl->finishAllRequests();

Here's the code I use to simply see if I can connect or not

$ch = curl_init();

$options = array(
    CURLOPT_URL            => $url,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HEADER         => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_ENCODING       => "",
    CURLOPT_AUTOREFERER    => true,
    CURLOPT_CONNECTTIMEOUT => 120,
    CURLOPT_TIMEOUT        => 120,
    CURLOPT_MAXREDIRS      => 10,
    CURLOPT_SSL_VERIFYPEER => false,
    CURLOPT_SSL_VERIFYHOST => false,
);
curl_setopt_array( $ch, $options );
$response = curl_exec($ch); 
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);

print_r(curl_getinfo($ch));

if ( $httpCode != 200 ){
    echo "Return code is {$httpCode} \n"
        .curl_error($ch);
} else {
    echo "<pre>".htmlspecialchars($response)."</pre>";
}

curl_close($ch);

Running that when I'm unable to connect results in this:

Array ( [url] => http://urlicantgetto.com/ [content_type] => [http_code] => 0 [header_size] => 0 [request_size] => 121 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 30.073574 [namelookup_time] => 0.003384 [connect_time] => 0.025365 [pretransfer_time] => 0.025466 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => -1 [upload_content_length] => 0 [starttransfer_time] => 30.073523 [redirect_time] => 0 ) Return code is 0 Empty reply from server

This sounds like it's a network or firewall issue, rather than a PHP/code issue.

Either HostGator is blocking your outbound connections because you have a spike in outbound traffic that could be misinterpreted as a small DOS attack, or the game website is blocking you for the same reason. Especially since this has only started since the number of requests has increased. And also the HTTP status code of 0 suggests firewall behaviour .

Alternatively, perhaps the connections aren't closing properly after the curl requests and later on when you try and load that website or download a file you can't because there are already too many open connections from your server.

If you have SSH access to your server I might be able to help debug if it's the network connections open problem, otherwise you'll need to speak to HostGator and the game website owners to see if either party is blocking you at all.

Another solution might be to scrape the game website slower (introduce a wait time between requests) to avoid being flagged as high network traffic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM