简体   繁体   English

尝试使用PHP cURL从网站获取HTML不起作用

[英]Attempt to get HTML from website using PHP cURL does not work

I am attempting to write a script that can retrieve the HTML from my school's schedule search webpage. 我正在尝试编写一个脚本,该脚本可以从学校的日程安排搜索网页中检索HTML。 I am able to visit the web page normally when I visit it using a browser, but when I try to get it to work using cURL, it gets the HTML from the redirected page. 当我使用浏览器访问网页时,我能够正常访问该网页,但是当我尝试使用cURL使其正常工作时,它会从重定向页面获取HTML。 When I changed the 当我改变

CURLOPT_FOLLOWLOCATION

variable from true to false, it only outputs a blank page with the headers sent. 变量从true到false,它仅输出带有已发送标题的空白页。

For reference, my PHP code is 供参考,我的PHP代码是

<?php
$curl_connection = curl_init('https://www.registrar.usf.edu/ssearch/');

curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");

$result = curl_exec($curl_connection);

print $result;

?>

The website that I am trying to get the HTML of from cURL is https://www.registrar.usf.edu/ssearch/ or https://www.registrar.usf.edu/ssearch/search.php 我正在尝试从cURL获取HTML的网站是https://www.registrar.usf.edu/ssearch/https://www.registrar.usf.edu/ssearch/search.php

Any ideas? 有任何想法吗?

I added 2 lines more, which now saves cookies which decides whether to redirect you when you try scraping the shedule's page. 我又增加了2行,现在会保存cookie,这些cookie决定了当您尝试刮除屏幕的页面时是否重定向您。

$curl_connection = curl_init();
$url = "https://www.registrar.usf.edu/ssearch/search.php";
curl_setopt($curl_connection, CURLOPT_URL, $url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos.
curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
echo $result;

Also, I havent seen anyone putting urls in curl_init yet. 另外,我还没有看到有人在curl_initcurl_init网址。

Here is the cookie : 这是Cookie:

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

www.registrar.usf.edu   FALSE   /   FALSE   0   PHPSESSID   eied78t0v1qlqcop0rdk214361
www.registrar.usf.edu   FALSE   /ssearch/   FALSE   1336718465  cookie_test cookie_set

If you ever wanna debug a non working curl stuff, start with var_dump(curl_getinfo($curl_connection)); 如果您想调试不起作用的curl东西,请从var_dump(curl_getinfo($curl_connection)); and next one to check is curl_error($curl_connection); 下一个要检查的是curl_error($curl_connection);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM