简体   繁体   English

PHP cURL无法加载响应数据

[英]PHP cURL failed to load response data

I am attempting to do data scraping with php but the url I need to access requires post data. 我正在尝试使用php进行数据抓取,但是我需要访问的url需要发布数据。

<?php 

//set POST variables
$url = 'https://www.ncaa.org/';
//$url = 'https://web3.ncaa.org/hsportal/exec/hsAction?hsActionSubmit=searchHighSchool';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

?>

When I tried accessing the second $url where the actual information is hosted it returns failed to load response data, but It will allow me to access the ncaa home page. 当我尝试访问托管实际信息的第二个$ url时,它返回失败,无法加载响应数据,但是它将允许我访问ncaa主页。 Is there a reason why I get a failed to load response data even though I am sending the correct form data? 即使发送正确的表单数据,我也无法加载响应数据吗?

The site apparently checks for a recognized user agent. 该站点显然在检查公认的用户代理。 By default PHP curl doesn't send a User-Agent header. 默认情况下,PHP curl不发送User-Agent标头。 Add

curl_setopt($curl, CURLOPT_USERAGENT, 'curl/7.21.4');

and the script returns a response. 脚本返回一个响应。 However, in this case, the response says that it requires a newer browser than the one you have. 但是,在这种情况下,响应说它需要比您拥有的浏览器更新的浏览器。 So you should copy the user agent string from a real browser, eg 因此,您应该从真实的浏览器中复制用户代理字符串,例如

curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');

Also, it requires the parameters to be sent in application/x-www-form-urlencoded format. 另外,它要求参数以application/x-www-form-urlencoded格式发送。 When you use an array as the argument to CURLOPT_POSTFIELDS it uses multipart/form-data . 当您使用数组作为CURLOPT_POSTFIELDS的参数CURLOPT_POSTFIELDS它将使用multipart/form-data So change that line to: 因此,将该行更改为:

curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));

to convert the array to a URL-encoded string. 将数组转换为URL编码的字符串。

And in the URL, leave out ?hsActionSubmit=searchHighSchool , as that parameter is sent in the POST fields. 并在URL中,省略?hsActionSubmit=searchHighSchool ,因为该参数在POST字段中发送。

The final, working script looks like this: 最终的工作脚本如下所示:

<?php
//set POST variables
//$url = 'https://www.ncaa.org/';
$url = 'https://web3.ncaa.org/hsportal/exec/hsAction';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));
curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');
//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

curl HTTPS connections needs to turn off specical option. curl HTTPS连接需要关闭特定选项。 CURLOPT_SSL_VERIFYPEER CURLOPT_SSL_VERIFYPEER

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// ** This option MUST BE FALSE **
**curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);**

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM