简体   繁体   中英

I have made a proxy scraper in PHP but I don't know how to check if the proxy is live

The below code scrapes the proxy from the website but what I want is the program to check if the proxy is alive or not one by one and then save that proxy in the file. Can someone help me out to do so

<?php

header('Content-Type:application/json'); 
$url = "https://www.my-proxy.com/free-proxy-list.html"; 
 
$ch = curl_init(); 

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/'.rand(111,999).'.36 (KHTML, like Gecko) Chrome/88.0.'.rand(1111,9999).'.104 Safari/'.rand(111,999).'.36');
curl_setopt($ch, CURLOPT_URL, $url); 

$proxies = array();
$firstcount = 1;
$endcound = 10;
for ($i = $firstcount; $i <= $endcound; $i++){
    curl_setopt($ch, CURLOPT_URL, "https://www.my-proxy.com/free-proxy-list-$i.html"); 
    $result =curl_exec($ch);
  

    ///Get Proxy 
    // >102.64.122.214:8085#U
    preg_match_all("!\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}:.\d{2,4}!", $result, $matches);
    $proxies = array_merge($proxies, $matches[0]);
}
curl_close($ch);
print_r($proxies);

?>

There are multiple ways to test, easiest one being an option in 'file_get_contents' request

$options = array(
  'http'=>array(
        'proxy' => 'tcp://' . $prox,   //IP:PORT info. ie: 8.8.8.8:2222
        'timeout' => 2,
    'request_fulluri' => true,
    'method'=>"GET",
    'header'=>"Accept-language: en\r\n" .
    "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36\r\n"
  )
);
$context = stream_context_create($options);
$base_url='http://lotsofrandomstuff.com/1.php'; //url that simply returns '1' each time
$web=@file_get_contents($base_url,false,$context); 
if($web=='1')
   {
        echo "proxy is good";
   }

else
{
        echo "proxy is dead";
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM