简体   繁体   English

无法通过php curl获得html dom。

[英]Cant get the html dom by php curl.

I have used php CURL to get the html or echo the html. 我已经使用php CURL来获取html或回显html。 But it is suddenky redirecting, when i am trying with this code. 但是,当我尝试使用此代码时,这是突然重定向。

    $cookie = tempnam ("/tmp", "CURLCOOKIE");  
    $ch = curl_init(); 

  function get_data( $ch, $url, $post, $cookie ){
    $agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7"; 
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    //curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); 
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); 
    curl_setopt($ch, CURLOPT_HEADER, 0); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 1); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
    if( $post != '' ) 
      curl_setopt($ch, CURLOPT_POSTFIELDS, $post); 
    return curl_exec($ch); 
  }
  $url = 'https://iapps.courts.state.ny.us/webcivil/FCASSearch?param=I';
  $html = get_data( $ch, $url, '', '' );
  echo $html; exit;

I have played with these 我玩过这些

CURLOPT_RETURNTRANSFER,
CURLOPT_FOLLOWLOCATION,
CURLOPT_COOKIEJAR,
CURLOPT_COOKIEFILE

But still i got redirection when trying to get the html. 但是,当我尝试获取html时,仍然得到了重定向。 How can i get the HTML of the page or is there any other thing try ? 如何获取页面的HTML或还有其他尝试?

Here is a fixed working code to grab the code of the page. 这是获取页面代码的固定工作代码。

  $cookie = tempnam ("/tmp", "CURLCOOKIE");  
  $ch = curl_init(); 

  function get_data( $curl, $url, $post, $cookie ){
    $agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7"; 
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_USERAGENT, $agent);
    curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie); 
    curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie); 
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 0); 
    curl_setopt($curl, CURLOPT_HEADER, 0); 
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); 
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);
    if( $post != '' ) 
      curl_setopt($curl, CURLOPT_POSTFIELDS, $post); 
    return curl_exec($curl); 
  }
  $url = 'https://iapps.courts.state.ny.us/webcivil/FCASSearch?param=I';
  $html = get_data( $ch, $url, '', '' );
  echo htmlspecialchars($html);

But have you seen what you get on this? 但是您看到了什么吗? Almost only JS which doesnt seem to be very usefull to parse. 几乎只有JS解析起来似乎不太有用。

You can take idea from this code. 您可以从这段代码中获得启发。 Give a path to page from which you want to get html content in live_url. 在live_url中提供要从中获取html内容的页面的路径。

$live_url = "http://www.example.com/page/header.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $live_url);
curl_setopt($ch, CURLOPT_TIMEOUT, 1000);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($ch);
$res = curl_getinfo($ch);
curl_close($ch);
echo $content;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM