[英]Attempt to get HTML from website using PHP cURL does not work
I am attempting to write a script that can retrieve the HTML from my school's schedule search webpage. 我正在尝试编写一个脚本,该脚本可以从学校的日程安排搜索网页中检索HTML。 I am able to visit the web page normally when I visit it using a browser, but when I try to get it to work using cURL, it gets the HTML from the redirected page.
当我使用浏览器访问网页时,我能够正常访问该网页,但是当我尝试使用cURL使其正常工作时,它会从重定向页面获取HTML。 When I changed the
当我改变
CURLOPT_FOLLOWLOCATION
variable from true to false, it only outputs a blank page with the headers sent. 变量从true到false,它仅输出带有已发送标题的空白页。
For reference, my PHP code is 供参考,我的PHP代码是
<?php
$curl_connection = curl_init('https://www.registrar.usf.edu/ssearch/');
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
print $result;
?>
The website that I am trying to get the HTML of from cURL is https://www.registrar.usf.edu/ssearch/ or https://www.registrar.usf.edu/ssearch/search.php 我正在尝试从cURL获取HTML的网站是https://www.registrar.usf.edu/ssearch/或https://www.registrar.usf.edu/ssearch/search.php
Any ideas? 有任何想法吗?
I added 2 lines more, which now saves cookies which decides whether to redirect you when you try scraping the shedule's page. 我又增加了2行,现在会保存cookie,这些cookie决定了当您尝试刮除屏幕的页面时是否重定向您。
$curl_connection = curl_init();
$url = "https://www.registrar.usf.edu/ssearch/search.php";
curl_setopt($curl_connection, CURLOPT_URL, $url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos.
curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
echo $result;
Also, I havent seen anyone putting urls in curl_init
yet. 另外,我还没有看到有人在
curl_init
中curl_init
网址。
Here is the cookie : 这是Cookie:
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
www.registrar.usf.edu FALSE / FALSE 0 PHPSESSID eied78t0v1qlqcop0rdk214361
www.registrar.usf.edu FALSE /ssearch/ FALSE 1336718465 cookie_test cookie_set
If you ever wanna debug a non working curl stuff, start with var_dump(curl_getinfo($curl_connection));
如果您想调试不起作用的curl东西,请从
var_dump(curl_getinfo($curl_connection));
and next one to check is curl_error($curl_connection);
下一个要检查的是
curl_error($curl_connection);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.