简体   繁体   中英

Trying to scrape kickasstorrents with simple html dom

I am trying to scrape kickasstorrents with simple html dom, but I am getting an error and I haven't even started yet. I followed some simple html tutorials and I have set up my url and using curl.

Code is as follows:

<?php
require('inc/config.php');
include_once('inc/simple_html_dom.php');

function scrap_kat() {

// initialize curl
$html = 'http://katcr.to/new/';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $html);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$ip=rand(0,255).'.'.rand(0,255).'.'.rand(0,255).'.'.rand(0,255);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/".rand(3,5).".".rand(0,3)." (Windows NT ".rand(3,5).".".rand(0,2)."; rv:2.0.1) Gecko/20100101 Firefox/".rand(3,5).".0.1");
$html2 = curl_exec($ch);
if($html2 === false)
{
    echo 'Curl error: ' . curl_error($ch);
}
else
{
    // create HTML DOM
    $kat = file_get_contents($html);
}
curl_close($ch);

// scripting starts




// clean up memory
$kat->clear();
unset($kat);
// return information
return $ret;

}
$ret = scrap_kat();
echo $ret;
?>

I receive the errors

Fatal error: Call to a member function clear() on resource in C:\\wamp64\\www\\index.php on line 36

What do I do wrong? Thanks.

file_get_contents is PHP's built in function. For simple html dom you can use file_get_html

Replace

$kat = file_get_contents($html);

with

$kat = file_get_html($html);

Why you are returning $ret; as your code in your question. There is no variable $ret in you function scrap_kat()

You can return $kat instead of $ret and don't unset($kat);

Simple_html_dom is a class. In that class there may be a function call, clear or it is in Simple_html_dom_node class. But In simple html dom, you need to use simple_html_dom class.

@Hassaan, is correct. file_get_contents is a native php function, you have to create an object of simple_html_dom class. Like,

$html = new simple_html_dom();

And use this below code.

function scrap_kat() {
$url = 'http://katcr.to/new/';
// $timeout= 120;
# create object
$html = new simple_html_dom();
#### CURL BLOCK ####
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/".rand(3,5).".".rand(0,3)." (Windows NT ".rand(3,5).".".rand(0,2)."; rv:2.0.1) Gecko/20100101 Firefox/".rand(3,5).".0.1");
//curl_setopt($curl, CURLOPT_TIMEOUT, $timeout);
$ip=rand(0,255).'.'.rand(0,255).'.'.rand(0,255).'.'.rand(0,255);
curl_setopt($curl, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));
$content = curl_exec($curl);
curl_close($curl);
# note the variable change.
# load the curl string into the object.
$html->load($content);
//echo $ip;
#### END CURL BLOCK ####
print_r($html->find('a'));
// clean up memory
$html->clear();
unset($html);
}
scrap_kat();

Well, their are a lot of errors in your code, so I am just telling you how you can do this. If explanation needed, please comment below this answer. I will.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM