尝试使用PHP cURL从网站获取HTML不起作用

Question

I am attempting to write a script that can retrieve the HTML from my school's schedule search webpage. 我正在尝试编写一个脚本，该脚本可以从学校的日程安排搜索网页中检索HTML。 I am able to visit the web page normally when I visit it using a browser, but when I try to get it to work using cURL, it gets the HTML from the redirected page. 当我使用浏览器访问网页时，我能够正常访问该网页，但是当我尝试使用cURL使其正常工作时，它会从重定向页面获取HTML。 When I changed the 当我改变

CURLOPT_FOLLOWLOCATION

variable from true to false, it only outputs a blank page with the headers sent. 变量从true到false，它仅输出带有已发送标题的空白页。

For reference, my PHP code is 供参考，我的PHP代码是

<?php
$curl_connection = curl_init('https://www.registrar.usf.edu/ssearch/');

curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");

$result = curl_exec($curl_connection);

print $result;

?>

The website that I am trying to get the HTML of from cURL is https://www.registrar.usf.edu/ssearch/ or https://www.registrar.usf.edu/ssearch/search.php 我正在尝试从cURL获取HTML的网站是https://www.registrar.usf.edu/ssearch/或https://www.registrar.usf.edu/ssearch/search.php

Any ideas? 有任何想法吗？

Answer 1

I added 2 lines more, which now saves cookies which decides whether to redirect you when you try scraping the shedule's page. 我又增加了2行，现在会保存cookie，这些cookie决定了当您尝试刮除屏幕的页面时是否重定向您。

$curl_connection = curl_init();
$url = "https://www.registrar.usf.edu/ssearch/search.php";
curl_setopt($curl_connection, CURLOPT_URL, $url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos.
curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
echo $result;

Also, I havent seen anyone putting urls in curl_init yet. 另外，我还没有看到有人在curl_init中curl_init网址。

Here is the cookie : 这是Cookie：

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

www.registrar.usf.edu   FALSE   /   FALSE   0   PHPSESSID   eied78t0v1qlqcop0rdk214361
www.registrar.usf.edu   FALSE   /ssearch/   FALSE   1336718465  cookie_test cookie_set

If you ever wanna debug a non working curl stuff, start with var_dump(curl_getinfo($curl_connection)); 如果您想调试不起作用的curl东西，请从var_dump(curl_getinfo($curl_connection)); and next one to check is curl_error($curl_connection); 下一个要检查的是curl_error($curl_connection);

尝试使用PHP cURL从网站获取HTML不起作用

问题描述

1 个解决方案

解决方案1
3 已采纳 2012-05-09 06:41:19

尝试使用PHP cURL从网站获取HTML不起作用

问题描述

1 个解决方案

解决方案1 3 已采纳 2012-05-09 06:41:19

解决方案1
3 已采纳 2012-05-09 06:41:19