简体   繁体   English

PHP5 cURL-尝试抓取页面时,它将加载空白页面

[英]PHP5 cURL - When attempting to scrape a page, it loads a blank page

I'm trying to scrape some recipes off a page to use as samples for a school project, but the page just keeps loading a blank page. 我正在尝试从页面上刮一些食谱以用作学校项目的样本,但是该页面一直在加载空白页面。

I'm following this tutorial - here 我正在关注本教程- 这里

This is my code: 这是我的代码:

<?php

function curl($url) {
    $ch = curl_init();  // Initialising cURL
    curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}
function scrape_between($data, $start, $end){
    $data = stristr($data, $start); // Stripping all data from before $start
    $data = substr($data, strlen($start));  // Stripping $start
    $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
    $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
    return $data;   // Returning the scraped data from the function
}

$continue = true;

$url = curl("https://www.justapinch.com/recipes/main-course/");

while ($continue == true) {
    $results_page = curl($url);
    $results_page = scrape_between($results_page,"<div id=\"grid-normal\">","<div id=\"rightside-content\"");
    $separate_results = explode("<h3 class=\"tight-margin\"",$results_page);

    foreach ($separate_results as $separate_result) {
        if ($separate_result != "") {
            $results_urls[] = "https://www.justapinch.com" . scrape_between($separate_result,"href=\"","\" class=\"");
        }
    }

    // Commented out to test code above

    // if (strpos($results_page,"Next Page")) {
    //     $continue = true;
    //     $url = scrape_between($results_page,"<nav><div class=\"col-xs-7\">","</div><nav>");
    //     if (strpos($url,"Back</a>")) {
    //         $url = scrape_between($url,"Back</a>",">Next Page");
    //     }
    //     $url = "https://www.justapinch.com" . scrape_between($url, "href=\"", "\"");
    // } else {
    //     $continue = false;
    // }
    // sleep(rand(3,5));

    print_r($results_urls);
}
?>

I'm using cloud9 and I've installed php5 cURL , and am running apache2 . 我正在使用cloud9并且已经安装了php5 cURL ,并且正在运行apache2 I would appreciate any help. 我将不胜感激任何帮助。

This is where the problem lies: 这是问题所在:

$results_page = curl($url);

You tried to fetch content not from a URL, but from a HTML page. 您尝试不是从URL而是从HTML页面获取内容。 Because, right before while() , you set $url to the result of a page. 因为在while()之前,您将$url设置$url页面的结果。 I think you should do the following: 我认为您应该执行以下操作:

$results_page = curl("https://www.justapinch.com/recipes/main-course/");

edit: 编辑:

You should change how you query the html to using DOM . 您应该将查询html的方式更改为使用DOM

why do people do this? 人们为什么这样做? code completely void of error checking, then they go to some forum and ask why is this code, which completely ignores any and all errors, not working? 代码完全没有错误检查,然后他们去某个论坛,问why is this code, which completely ignores any and all errors, not working? I DONT FKING KNOW, BUT AT LEAST YOU COULD PUT UP SOME ERROR CHECKING AND RUN IT BEFORE ASKING. 我不知道,但是至少您可能会提出一些错误检查并在请求之前运行它。 it's not just you, lots of people are doing it, and its annoying af, and you should all feel bad for doing it. 不仅是您,很多人都在这样做,而且烦人的事,您都应该为此感到难过。 curl_setopt returns bool(false) if there's an error setting the option. 如果在设置选项时出错,curl_setopt返回bool(false)。 curl_exec returns bool(false) if there was an error in the transfer. 如果传输中有错误,curl_exec返回bool(false)。 curl_init returns bool(false) if there was an error creating the curl handle. 如果创建卷曲句柄时出错,curl_init返回bool(false)。 extract the error description with curl_error, and report it with \\RuntimeException. 使用curl_error提取错误描述,并使用\\ RuntimeException报告它。 now delete this thread, add some error checking, and if the error checking does not reveal the problem, or it does but you're not sure how to fix it, THEN make a new thread about it. 现在,删除该线程,添加一些错误检查,如果错误检查没有发现问题,或者确实存在,但是您不确定如何解决问题,则新建一个线程。

here's some error-checking function wrappers to get you started: 这是一些错误检查功能包装器,可帮助您入门:

function ecurl_setopt ( /*resource*/$ch , int $option , /*mixed*/ $value ):bool{
    $ret=curl_setopt($ch,$option,$value);
    if($ret!==true){
        //option should be obvious by stack trace
        throw new RuntimeException ( 'curl_setopt() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
    }
    return true;
}
function ecurl_exec ( /*resource*/$ch):bool{
    $ret=curl_exec($ch);
    if($ret!==true){
        throw new RuntimeException ( 'curl_exec() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
    }
    return true;
}


function return_var_dump(/*...*/){
    $args = func_get_args ();
    ob_start ();
    call_user_func_array ( 'var_dump', $args );
    return ob_get_clean ();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM