使用mysql和php创建网址树

Question

I've been banging my head all day at this and still can't come up with some working answers. 我整天都在为此努力，但仍然无法提出一些可行的答案。

I want to track all my visitors by cookie on my website and save the current URL (or path) that the user visits with the unique user key included in a mysql row. 我想通过我的网站上的cookie跟踪所有访问者，并使用mysql行中包含的唯一用户密钥保存用户访问的当前URL（或路径）。

After some time and some unique visitors I want to create a tree of paths that the users followed through my site. 经过一段时间和一些唯一的访问者后，我想创建一棵路径树，用户可以通过它们进入我的网站。 (something like Google Analytics does with there visitor flow reports). （类似Google Analytics（分析）的访问者流量报告）。

But I can't somehow figure out how to make the query that walks through these rows and creates a "tree" of urls (with count or percentage) 但是我无法以某种方式找出如何遍历这些行并创建网址“树”（带有计数或百分比）的查询

If someone could help me out, that will be much appreciated. 如果有人可以帮助我，将不胜感激。

-- Edit I already have a mysql database and tables in place -编辑我已经有一个mysql数据库和表

CREATE TABLE `journies` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `profile_id` int(11) NOT NULL,
  `url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `profiles` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- update 2 Oke, this may explain better what I want to do. -更新2 Oke，这可能会更好地解释我想要做什么。

I want to create a tree with urls like this: 我想用这样的网址创建一棵树：

                                                  /- domain.com/news (100%)
                     /- domain.com/about (33%) --/
domain.com (100%) --/
                    \- domain.com/contact (33%)
                    \- domain.com/news (33%) --\
                                                \- domain.com/news/id/1 (100%)

Ofcourse this may be done with multiple queries (although it will be great if it could be done with one ;)) 当然，这可以通过多个查询来完成（尽管如果可以通过一个查询来完成，那将是很好的选择）

Answer 1

Okay, this was a doozy, but I think I've finally got something for you to work with. 好的，这有点麻烦，但是我想我终于可以为您提供一些帮助。

The premise is to pull out the URLs and counts from the database, grouping and sorting them by url. 前提是从数据库中提取URL和计数，并按url将其分组和排序。 Store those results into a master array. 将这些结果存储到主阵列中。 Then, find the base domain of the urls, which I called the trunk. 然后，找到URL的基本域，我将其称为主干。 Loop through the master array and find any items that have the same base domain as the trunk and find their percentages. 遍历主阵列，找到与主干具有相同基本域的所有项目，并找到其百分比。

One other thing to note is that these are level-specific. 要注意的另一件事是，这些是特定于级别的。 So cnn.com/employees (2 levels) would be on the same level as cnn.com/news . 因此， cnn.com/employees （2个级别）将与cnn.com/news处于同一级别。

Here is the SQL data that I used for this: 这是我用于此的SQL数据：

INSERT INTO `journies` (`id`, `site_id`, `profile_id`, `url`, `created_at`, `updated_at`) VALUES
    (1, 1, 1, 'domain.com', '2014-02-19 15:34:54', '0000-00-00 00:00:00'),                      
    (2, 1, 1, 'domain.com/about', '2014-02-19 15:35:57', '0000-00-00 00:00:00'),                
    (3, 1, 1, 'domain.com/contact', '2014-02-19 15:36:12', '0000-00-00 00:00:00'),              
    (4, 1, 1, 'domain.com/news', '2014-02-19 15:36:29', '0000-00-00 00:00:00'),                 
    (5, 1, 1, 'domain.com/news/id/1', '2014-02-19 15:39:26', '0000-00-00 00:00:00'),            
    (6, 1, 1, 'domain.com/contact', '2014-02-19 15:50:26', '0000-00-00 00:00:00'),              
    (7, 1, 1, 'cnn.com/news/id/1', '2014-02-19 16:00:02', '0000-00-00 00:00:00'),               
    (8, 1, 1, 'cnn.com/news', '2014-02-19 16:00:15', '0000-00-00 00:00:00'),                    
    (9, 1, 1, 'cnn.com', '2014-02-19 16:00:25', '0000-00-00 00:00:00'),                         
    (10, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
    (11, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
    (12, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
    (13, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
    (14, 1, 1, 'domain.com/news/id/2', '2014-02-20 08:53:29', '0000-00-00 00:00:00'),           
    (15, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
    (16, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
    (17, 1, 1, 'cnn.com/employees/friekot', '2014-02-20 15:23:34', '0000-00-00 00:00:00'),      
    (18, 1, 1, 'cnn.com/employees', '2014-02-20 15:23:34', '0000-00-00 00:00:00');

And here is the code that I came up with: 这是我想出的代码：

<?php



$link = mysqli_connect("localhost", "user", "pass", "database");






// SET THE DEFAULTS
$trunk_array = array();
$master_array = array();






// PULL OUT THE DATA FROM THE DATABASE
$q_get_tracking_info = "SELECT *, COUNT(url) AS url_count FROM journies WHERE site_id = 1 AND profile_id = 1 GROUP BY url ORDER BY url;";
$r_get_tracking_info = mysqli_query($link, $q_get_tracking_info) or trigger_error("Cannot Get Tracking Info: (".mysqli_error().")", E_USER_ERROR);



while ($row_get_tracking_info = mysqli_fetch_array($r_get_tracking_info)) {



    $url = $row_get_tracking_info['url'];
    $url_count = $row_get_tracking_info['url_count'];






    // EXPLODE THE DOMAIN PARTS
    $domain_parts = explode('/', $url);






    // FIND THE TOTAL COUNTS FOR EACH LEVEL OF ARRAY
    // - SO THAT WE CAN DIVIDE BY IT LATER TO GET THE PERCENTAGE
    $count = count($domain_parts);
    if (!isset($level_totals[$domain_parts[0]])) {
        $level_totals[$domain_parts[0]] = array();
    }

    if (isset($level_totals[$domain_parts[0]][$count])) {
        $level_totals[$domain_parts[0]][$count] += $url_count;
    }
    else {
        $level_totals[$domain_parts[0]][$count] = $url_count;
    }






    // BUILD A TRUNK ARRAY SO WE CAN DEFINE SECTIONS
    if ($url == $domain_parts[0]) {
        $trunk_array[] = array($url, $url_count);
    }






    // BUILD A MASTER ARRAY OF THE ITEMS AS WE WILL LAY THEM OUT
    $master_array[$url] = $url_count;


}






// FIND THE TOTAL TRUNK COUNT SO WE CAN DIVIDE BY IT LATER
$total_trunk_count = 0;
foreach ($trunk_array AS $trunk_array_key => $trunk_array_val) {
    foreach($trunk_array_val AS $trunk_count_val) {
        $total_trunk_count += $trunk_count_val[0];
    }
}






// LOOP THROUGH THE TRUNK ITEMS AND PULL OUT ANY MATCHES FOR THAT TRUNK
foreach ($trunk_array AS $trunk_item_key => $trunk_item_val) {



    $trunk_item = $trunk_item_val[0];
    $trunk_count = $trunk_item_val[1];






    // FIND THE PERCENTAGE THIS TRUNK WAS ACCESSED
    $trunk_percent = round(($master_array[$trunk_item] / $total_trunk_count) * 100);






    // PRINT THE TRUNK OUT    
    print "<BR><BR>".$trunk_item.' - ('.$trunk_percent.'%)';






    // LOOP THROUGH THE MASTER ARRAY AND GET THE RESULTS FOR ANY PATHS UNDER THE TRUNK
    foreach ($master_array AS $master_array_key => $master_array_val) {



        // PERFORM A MATCH FOR DOMAINS BELONGING TO THIS PARTICULAR TRUNK
        if (preg_match('/^'.$trunk_item.'/', $master_array_key)) {



            // SET A DEFAULT DELIMITER PAD
            $delimiter_pad = '';






            // EXPLODE EACH PATH INTO PARTS AND COUNT HOW MANY PARTS WE HAVE
            $domain_parts_2 = explode('/', $master_array_key);
            $count = count($domain_parts_2);






            // SET THE DELIMITER FOR HOW FAR DOWN ON THE TREE WE ARE
            // EACH INDENT WILL HAVE 8 SPACES
            for ($i = 2; $i <= $count; $i++) {
                $delimiter_pad .= '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;';
            }






            // SINCE WE ALREADY PRINTED OUT THE TRUNK, WE WILL ONLY SHOW ITEMS THAT ARE NOT THE TRUNK
            if ($master_array_key != $trunk_item) {



                // FIND THE PERCENTAGE OF THE ITEM, GIVEN THEIR LEVEL IN THE TREE
                $path_percentage = round(($master_array[$master_array_key] / $level_totals[$trunk_item][$count]) * 100);






                // PRINT OUT THE PATH AND PERCENTAGE
                print "<BR>".$delimiter_pad."|- ".$master_array_key.' - ('.$path_percentage.'%)';



            }



        }



    }



}

In the end, all of that outputs this: 最后，所有这些输出如下：

cnn.com - (75%)
        |- cnn.com/employees - (50%)
                |- cnn.com/employees/friekot - (100%)
        |- cnn.com/news - (50%)
                        |- cnn.com/news/id/1 - (100%)

domain.com - (25%)
        |- domain.com/about - (17%)
        |- domain.com/contact - (33%)
        |- domain.com/news - (17%)
                        |- domain.com/news/id/1 - (75%)
                        |- domain.com/news/id/2 - (25%)
        |- domain.com/prices - (33%)

There may be an easier way to do this, but this is the method that came to mind for me. 可能有一种更简单的方法来执行此操作，但这是我想到的方法。 I hope this works for you! 希望这对您有用！

Answer 2

Okay, so you can write a query like this which will look for a particular user and site and spit out each url that they visited in the order they visited them. 好的，因此您可以编写这样的查询，该查询将查找特定的用户和站点，并按照访问的顺序吐出他们访问的每个url 。

$sql = "SELECT url FROM journies WHERE site_id = 1 AND profile_id = 1 ORDER BY created_at";

$result = mysqli_query($connection, $sql);

while ($row = mysqli_fetch_array($result)) {
    $page_array[] = $row['url'];
}

$page_list = implode("\n =>", $page_array);

This will output: 这将输出：

domain.com
 => domain.com/about

使用mysql和php创建网址树

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-02-20 22:46:58

解决方案2
0 2014-02-19 19:08:09

使用mysql和php创建网址树

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-02-20 22:46:58

解决方案2 0 2014-02-19 19:08:09

解决方案1
1 已采纳 2014-02-20 22:46:58

解决方案2
0 2014-02-19 19:08:09