I'm sorry, but I speak a little English only.
I use this:
<?php
function file_get_contents_curl ( $url ) {
$ch = curl_init ();
curl_setopt ( $ch, CURLOPT_AUTOREFERER, TRUE );
curl_setopt ( $ch, CURLOPT_HEADER, 0 );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ( $ch, CURLOPT_URL, $url );
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYPEER, 0 ); //
curl_setopt ( $ch, CURLOPT_SSL_VERIFYHOST, 0 ); //
curl_setopt ( $ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; rv:71.0) Gecko/20100101 Firefox/71.0' ); // spoof
$data = curl_exec ( $ch );
curl_close ( $ch );
return $data;
}
include ( __DIR__ . '/simplehtmldom_1_9_1/simple_html_dom.php' );
// 1. OK: $url = 'https://www.p***hub.com/model/ashley-porner';
// 2. OK: $url = 'https://www.p***hub.com/model/ashley-diamond-and-diamond-king';
// 3. NOT OK: $url = 'https://www.p***hub.com/model/ambercashh';
// 4. NOT OK: $url = 'https://www.p***hub.com/model/autumn-raine';
$html = file_get_contents_curl ( $url );
$html = str_get_html ( $html );
var_dump ( $html ); // boolean(false) if NOT OK
?>
The 1-2. URL is ok, but the 3-4. URL is not ok. Not show, no view. The return is false.
I try change from 600000 to 6000000 (~/simplehtmldom_1_9_1/simple_html_dom.php), but the new value is more loading time and than crashed my website:
// OLD: defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 600000);
defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 6000000); // NEW
What is the problem?
Thanks.
As test you can run the following - obviously the urls will need editing but it shows reasonable performance - why you were running out of memory must therefore lie in code not included
<?php
function file_get_contents_curl ( $url ) {
$ch = curl_init ();
curl_setopt ( $ch, CURLOPT_AUTOREFERER, TRUE );
curl_setopt ( $ch, CURLOPT_HEADER, 0 );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ( $ch, CURLOPT_URL, $url );
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYPEER, 0 );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYHOST, 0 );
curl_setopt ( $ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; rv:71.0) Gecko/20100101 Firefox/71.0' ); // spoof
$data = curl_exec ( $ch );
curl_close ( $ch );
return $data;
}
$start=time();
$memstart=memory_get_usage();
$baseurl='https://www.*******.com/model/';
$models=['ashley-porner','ashley-diamond-and-diamond-king','ambercashh','autumn-raine'];
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->recover=true;
$dom->strictErrorChecking=false;
/* do some expensive DOM operations to test performance */
$query='//section[ @class="topProfileHeader" ]/div/div/div[ @class="content-columns" ]/div[ @class="infoPiece" ]';
foreach( $models as $model ){
$url = $baseurl . $model;
$res = file_get_contents_curl( $url );
$dom->loadHTML( $res );
$xp=new DOMXPath( $dom );
libxml_clear_errors();
$col=$xp->query( $query );
if( $col->length > 0 ){
foreach( $col as $node ) {
echo str_repeat( '.', strlen( $node->nodeValue ) ) . '<br />';
}
}
}
$memory=memory_get_usage() - $memstart;
printf(
'<div style="padding:1rem; border:1px solid red;">Script took approx: %ss - consumed: %sMb, Peak memory consumption: %sMb</div>',
( time() - $start ),
round( $memory / pow(1024,2), 2 ),
round( memory_get_peak_usage() / pow(1024,2), 2 )
);
?>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.