简体   繁体   English

我现在使用php crawler提取了页面网址,我想计算每个页面上的浏览总数(点击数)。如何使用php?p

[英]i fetched page urls using php crawler now i want to count the total number of views (hits) on each page.how can i do it using php ?p

 $main_url="http://programming.com";
 $str = file_get_contents($main_url);

 // Gets Webpage Title
 if(strlen($str)>0)
 {

      $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
      preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
      $title=$title[1];
 }

 // Gets Webpage Description
 $b =$main_url;
 @$url = parse_url( $b );
 @$tags = get_meta_tags($url['scheme'].'://'.$url['host'] );
 $description=$tags['description'];

 // Gets Webpage Internal Links
 $doc = new DOMDocument; 
 @$doc->loadHTML($str); 

 $items = $doc->getElementsByTagName('a'); 
 foreach($items as $value) 
 { 
      $attrs = $value->attributes; 

      $sec_url[]=$attrs->getNamedItem('href')->nodeValue;

 }

 /*foreach ($sec_url as  $value) {
        print_r($value);

        ?>
    <br>
        <?php

}*/

foreach($sec_url as $value)
{

    $sq2 = "insert into datascience (link,title,description,internal_link) 

                     values('$main_url','$title','$description','$value')";  

    $res= mysqli_query($conn, $sq2);

I've converted all of the various methods your using to find various details (title etc.) to using XPath within the loaded document. 我已将您用来查找各种详细信息(标题等)的所有各种方法转换为在加载的文档中使用XPath。 This just makes things consistent. 这只是使事情保持一致。

The main thing I find is to have to work out a consistent way of fetching the details. 我发现的主要目的是必须找到一种一致的方式来获取细节。 In the page your using, each segment looks as though it's wrapped up in an <article> tag. 在您使用的页面中,每个段看起来好像都包裹在<article>标记中。 So first fetch all of these tags, then using this as a base, looks for the various items your after. 因此,首先获取所有这些标签,然后将其用作基础,然后寻找各种项目。

Then building XPath expressions to locate them within each <article> means you can pick all of the relevant details per item. 然后构建XPath表达式以在每个<article>定位它们,意味着您可以选择每个项目的所有相关详细信息。 In XPath - you use the descendant axis ( descendant::... ) to indicate you want the nodes inside the context node (passed in as the last parameter to evaluate() ).. 在XPath中-使用descendant轴( descendant::... )表示您想要上下文节点内的节点(作为最后一个参数传递给evaluate() )。

$main_url="http://programming.com";
$str = file_get_contents($main_url);

$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($str);
$xp = new DOMXPath($doc);

$title = $doc->getElementsByTagName("title")[0]->textContent;
$description = $xp->evaluate("string(//meta[@name='description']/@content)");

echo $title.PHP_EOL;
echo $description.PHP_EOL;

$articles = $doc->getElementsByTagName('article');
$pageArticles = [];
foreach($articles as $article) {
    $articleTitle = $xp->evaluate("string(descendant::span[@title='Views'])", $article);
    $articleViews = $xp->evaluate("string(descendant::h2[@class='title'])", $article);
    $pageArticles[] = ["title" => $articleTitle, "views" => $articleViews];
}

print_r($pageArticles);

Which just gave me as output... 这只是给我的输出...

Tutorials - Programming.com
Tap into the collective intelligence of researchers who are working on the same problems you are - right now.

Array
(
    [0] => Array
        (
            [title] => 1,031
            [views] => HTML Cheat Sheet
        )

    [1] => Array
        (
            [title] => 390
            [views] => Best Java Training Institutes In Noida 
        )

    [2] => Array
        (
            [title] => 329
            [views] => Best Salesforce Training institutes in noida
        )

    [3] => Array
        (
            [title] => 382
            [views] => Top Quality Digital Marketing Training Institutes in Noida    
        )

    [4] => Array
        (
            [title] => 308
            [views] => Make your studies with professional Best Oracle Training Institutes in Noida    
        )

    [5] => Array
        (
            [title] => 374
            [views] => Create a Unique Project with a Best Linux Training Institutes in Noida
        )

    [6] => Array
        (
            [title] => 385
            [views] => Webtrackker Technology Best Dot Net Training Institutes Available To Guide the Students 
        )

    [7] => Array
        (
            [title] => 430
            [views] => Availability of My University Help Offers Great Benefit to Students
        )

    [8] => Array
        (
            [title] => 350
            [views] => Webtrackker Institute of Professional Studies: Hadoop Training Institute in Noida    
        )

    [9] => Array
        (
            [title] => 416
            [views] => The Best Quality Digital Marketing Training Institutes in Noida
        )

)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我想使用php搜寻器从此文档中获取特定的网址 - I want to get specific urls from this document using a php crawler 我想使用 PHP 脚本创建一个爬虫 - I want to create a crawler using PHP script 在使用php打印每个pdf页面后,如何增加pdf中的收据编号? - How can i increment the of receipt number in pdf after printout of each pdf page using php? 我想使用 ajax 和 PHP 针对每个帖子数据获取喜欢的总数 - I want to fetch total number of likes against each post data using ajax and PHP 如何计算页数视图的数量? - How do i make a count that counts the number of page-views? 如何使用PHP条形码生成器将条形码打印到我想要​​的pdf格式页面上? - How do I print a barcode using barcode generator for PHP onto a pdf formatted page where I want it? 使用GAPI如何获得每个页面上的访问者计数? - Using GAPI how do I get visitor count on each page? 我如何使用php获取每个年龄段的人数 - how can i get count of each age group using php 我如何获得总价并将其显示在同一页面上PHP - How can i get the total price and display it in the same page PHP 我正在使用php和mysql.my开发网站,每个页面文件扩展名都以.php扩展名结尾如何避免这种情况? - I am developing website using php and mysql.my each page file extension end with .php extension how do i avoid that?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM