我现在使用php crawler提取了页面网址，我想计算每个页面上的浏览总数（点击数）。如何使用php？p

Question

 $main_url="http://programming.com";
 $str = file_get_contents($main_url);

 // Gets Webpage Title
 if(strlen($str)>0)
 {

      $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
      preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
      $title=$title[1];
 }

 // Gets Webpage Description
 $b =$main_url;
 @$url = parse_url( $b );
 @$tags = get_meta_tags($url['scheme'].'://'.$url['host'] );
 $description=$tags['description'];

 // Gets Webpage Internal Links
 $doc = new DOMDocument; 
 @$doc->loadHTML($str); 

 $items = $doc->getElementsByTagName('a'); 
 foreach($items as $value) 
 { 
      $attrs = $value->attributes; 

      $sec_url[]=$attrs->getNamedItem('href')->nodeValue;

 }

 /*foreach ($sec_url as  $value) {
        print_r($value);

        ?>
    <br>
        <?php

}*/

foreach($sec_url as $value)
{

    $sq2 = "insert into datascience (link,title,description,internal_link) 

                     values('$main_url','$title','$description','$value')";  

    $res= mysqli_query($conn, $sq2);

Answer 1

I've converted all of the various methods your using to find various details (title etc.) to using XPath within the loaded document. 我已将您用来查找各种详细信息（标题等）的所有各种方法转换为在加载的文档中使用XPath。 This just makes things consistent. 这只是使事情保持一致。

The main thing I find is to have to work out a consistent way of fetching the details. 我发现的主要目的是必须找到一种一致的方式来获取细节。 In the page your using, each segment looks as though it's wrapped up in an <article> tag. 在您使用的页面中，每个段看起来好像都包裹在<article>标记中。 So first fetch all of these tags, then using this as a base, looks for the various items your after. 因此，首先获取所有这些标签，然后将其用作基础，然后寻找各种项目。

Then building XPath expressions to locate them within each <article> means you can pick all of the relevant details per item. 然后构建XPath表达式以在每个<article>定位它们，意味着您可以选择每个项目的所有相关详细信息。 In XPath - you use the descendant axis ( descendant::... ) to indicate you want the nodes inside the context node (passed in as the last parameter to evaluate() ).. 在XPath中-使用descendant轴（ descendant::... ）表示您想要上下文节点内的节点（作为最后一个参数传递给evaluate() ）。

$main_url="http://programming.com";
$str = file_get_contents($main_url);

$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($str);
$xp = new DOMXPath($doc);

$title = $doc->getElementsByTagName("title")[0]->textContent;
$description = $xp->evaluate("string(//meta[@name='description']/@content)");

echo $title.PHP_EOL;
echo $description.PHP_EOL;

$articles = $doc->getElementsByTagName('article');
$pageArticles = [];
foreach($articles as $article) {
    $articleTitle = $xp->evaluate("string(descendant::span[@title='Views'])", $article);
    $articleViews = $xp->evaluate("string(descendant::h2[@class='title'])", $article);
    $pageArticles[] = ["title" => $articleTitle, "views" => $articleViews];
}

print_r($pageArticles);

Which just gave me as output... 这只是给我的输出...

Tutorials - Programming.com
Tap into the collective intelligence of researchers who are working on the same problems you are - right now.

Array
(
    [0] => Array
        (
            [title] => 1,031
            [views] => HTML Cheat Sheet
        )

    [1] => Array
        (
            [title] => 390
            [views] => Best Java Training Institutes In Noida 
        )

    [2] => Array
        (
            [title] => 329
            [views] => Best Salesforce Training institutes in noida
        )

    [3] => Array
        (
            [title] => 382
            [views] => Top Quality Digital Marketing Training Institutes in Noida    
        )

    [4] => Array
        (
            [title] => 308
            [views] => Make your studies with professional Best Oracle Training Institutes in Noida    
        )

    [5] => Array
        (
            [title] => 374
            [views] => Create a Unique Project with a Best Linux Training Institutes in Noida
        )

    [6] => Array
        (
            [title] => 385
            [views] => Webtrackker Technology Best Dot Net Training Institutes Available To Guide the Students 
        )

    [7] => Array
        (
            [title] => 430
            [views] => Availability of My University Help Offers Great Benefit to Students
        )

    [8] => Array
        (
            [title] => 350
            [views] => Webtrackker Institute of Professional Studies: Hadoop Training Institute in Noida    
        )

    [9] => Array
        (
            [title] => 416
            [views] => The Best Quality Digital Marketing Training Institutes in Noida
        )

)

我现在使用php crawler提取了页面网址，我想计算每个页面上的浏览总数（点击数）。如何使用php？p

问题描述

1 个解决方案

解决方案1
0 2019-07-28 07:25:54

我现在使用php crawler提取了页面网址，我想计算每个页面上的浏览总数（点击数）。如何使用php？p

问题描述

1 个解决方案

解决方案1 0 2019-07-28 07:25:54

解决方案1
0 2019-07-28 07:25:54