[英]i fetched page urls using php crawler now i want to count the total number of views (hits) on each page.how can i do it using php ?p
$main_url="http://programming.com";
$str = file_get_contents($main_url);
// Gets Webpage Title
if(strlen($str)>0)
{
$str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
$title=$title[1];
}
// Gets Webpage Description
$b =$main_url;
@$url = parse_url( $b );
@$tags = get_meta_tags($url['scheme'].'://'.$url['host'] );
$description=$tags['description'];
// Gets Webpage Internal Links
$doc = new DOMDocument;
@$doc->loadHTML($str);
$items = $doc->getElementsByTagName('a');
foreach($items as $value)
{
$attrs = $value->attributes;
$sec_url[]=$attrs->getNamedItem('href')->nodeValue;
}
/*foreach ($sec_url as $value) {
print_r($value);
?>
<br>
<?php
}*/
foreach($sec_url as $value)
{
$sq2 = "insert into datascience (link,title,description,internal_link)
values('$main_url','$title','$description','$value')";
$res= mysqli_query($conn, $sq2);
I've converted all of the various methods your using to find various details (title etc.) to using XPath within the loaded document. 我已将您用来查找各种详细信息(标题等)的所有各种方法转换为在加载的文档中使用XPath。 This just makes things consistent.
这只是使事情保持一致。
The main thing I find is to have to work out a consistent way of fetching the details. 我发现的主要目的是必须找到一种一致的方式来获取细节。 In the page your using, each segment looks as though it's wrapped up in an
<article>
tag. 在您使用的页面中,每个段看起来好像都包裹在
<article>
标记中。 So first fetch all of these tags, then using this as a base, looks for the various items your after. 因此,首先获取所有这些标签,然后将其用作基础,然后寻找各种项目。
Then building XPath expressions to locate them within each <article>
means you can pick all of the relevant details per item. 然后构建XPath表达式以在每个
<article>
定位它们,意味着您可以选择每个项目的所有相关详细信息。 In XPath - you use the descendant
axis ( descendant::...
) to indicate you want the nodes inside the context node (passed in as the last parameter to evaluate()
).. 在XPath中-使用
descendant
轴( descendant::...
)表示您想要上下文节点内的节点(作为最后一个参数传递给evaluate()
)。
$main_url="http://programming.com";
$str = file_get_contents($main_url);
$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($str);
$xp = new DOMXPath($doc);
$title = $doc->getElementsByTagName("title")[0]->textContent;
$description = $xp->evaluate("string(//meta[@name='description']/@content)");
echo $title.PHP_EOL;
echo $description.PHP_EOL;
$articles = $doc->getElementsByTagName('article');
$pageArticles = [];
foreach($articles as $article) {
$articleTitle = $xp->evaluate("string(descendant::span[@title='Views'])", $article);
$articleViews = $xp->evaluate("string(descendant::h2[@class='title'])", $article);
$pageArticles[] = ["title" => $articleTitle, "views" => $articleViews];
}
print_r($pageArticles);
Which just gave me as output... 这只是给我的输出...
Tutorials - Programming.com
Tap into the collective intelligence of researchers who are working on the same problems you are - right now.
Array
(
[0] => Array
(
[title] => 1,031
[views] => HTML Cheat Sheet
)
[1] => Array
(
[title] => 390
[views] => Best Java Training Institutes In Noida
)
[2] => Array
(
[title] => 329
[views] => Best Salesforce Training institutes in noida
)
[3] => Array
(
[title] => 382
[views] => Top Quality Digital Marketing Training Institutes in Noida
)
[4] => Array
(
[title] => 308
[views] => Make your studies with professional Best Oracle Training Institutes in Noida
)
[5] => Array
(
[title] => 374
[views] => Create a Unique Project with a Best Linux Training Institutes in Noida
)
[6] => Array
(
[title] => 385
[views] => Webtrackker Technology Best Dot Net Training Institutes Available To Guide the Students
)
[7] => Array
(
[title] => 430
[views] => Availability of My University Help Offers Great Benefit to Students
)
[8] => Array
(
[title] => 350
[views] => Webtrackker Institute of Professional Studies: Hadoop Training Institute in Noida
)
[9] => Array
(
[title] => 416
[views] => The Best Quality Digital Marketing Training Institutes in Noida
)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.