How to display hourly stats from mysql table

Question

I'm trying to generate data for a graph in PHP that displays the amount of records from a mysql table in a certain time range broken down by each hour. Each record has a unix timestamp.

For example, say I want to display the stats for today. The code below "works" but after running it and looking at what I've done, it's just horrible gibberish that happens to work. When I run this on a table with millions of indexed records, it's slowwww.

What it does now is performs a query for each hour until it reaches 24 hours. The problem is I'm trying to pull data from up to 10 other tables at the same time. This means I could be running up to 240 queries on every page load which is not good.

$c = '0';
$h = '1';
while($h < 25){
    $hr_start = 3600 * $c;
    $hr_stop = 3600 * $h;
    $query = "SELECT `reason`,`timestamp`
    FROM `c_blacklist` 
    WHERE `timestamp` > '".strtotime('today')."'  + ".$hr_start." AND `timestamp` < '".strtotime('today')."' + ".$hr_stop." AND `reason` = 'hardbounce'";
    $result = mysql_query($query) or die(mysql_error());
    $hardbounce_count = mysql_num_rows($result);
    $dataset5[] = array($h,$hardbounce_count);
    $h++;
    $c++;
}

I know there is a better way to do this and I just haven't been able to find much information on it. Is there a way to run 1 query and then have PHP break it down by the hour and insert into the dataset? I'm so confused and I appreciate any help. Thanks.

Answer 1

You could create a sort of "reporting query" that when called, would give you the last 24 hours of data.

The first step is to create a reference table with 24 rows containing the numbers 1-24 (or 0-23 depending on your logic). I will call this table hours . By using this reference table, you will still get a 0 count if no activity occurred within a given hour. This is different than an approach that just does GROUP BY on the timestamp.

Then, use a combination of the TIMEDIFF and HOUR functions to left join to this table. Something like this (untested but you get the idea):

SELECT
    COUNT(c_blacklist.reason) as num_reasons,
    hours.hour as hour
FROM hours
LEFT JOIN c_blacklist
   ON HOUR(TIMEDIFF(now(), c_blacklist.timestamp)) = hours.hour
GROUP BY hours.hour

This will output 24 rows, with the number of "reasons" from each of the past 24 hours. You could pretty easily add in some timestamps if you needed to

Answer 2

It will be much faster to have the database return you a count, rather than pulling back all the detail rows and doing the count on the client side.

And you can pull the counts for a full 24 hour period in one query, that will (likely) be much more efficient than making 24 round trips to the database to get the individual counts.

Also performance (of the query) will likely be improved if you have an index on c_blacklist(timestamp) , or even better, a covering index on c_blacklist(timestamp,reason) .

If the timestamp column is of datatype TIMESTAMP , then we can do some simple arithmetic to derive the "hour", and get a count by each "hour".

SELECT FROM_UNIXTIME((UNIX_TIMESTAMP(cb.`timestamp`) DIV 3600) * 3600) AS `cb_hour`
     , COUNT(1) AS cb_count
  FROM `c_blacklist` cb
 WHERE cb.`timestamp` >= DATE_ADD('2012-06-26 18:00',INTERVAL -1 DAY)
   AND cb.`timestamp` <  '2012-06-26 18:00'
   AND cb.`reason` = 'hardbounce'
 GROUP BY FROM_UNIXTIME((UNIX_TIMESTAMP(cb.`timestamp`) DIV 3600) * 3600)
 ORDER BY FROM_UNIXTIME((UNIX_TIMESTAMP(cb.`timestamp`) DIV 3600) * 3600)

If the timestamp column is of datatype DATETIME , it might be faster to use a different expression to get the hour:

SELECT DATE_FORMAT(cb.`timestamp`,'%Y-%m-%d %H:00:00') AS `cb_hour`
     , COUNT(1) AS cb_count
  FROM `c_blacklist` cb
 WHERE cb.`timestamp` >= DATE_ADD('2012-06-26 18:00',INTERVAL -1 DAY)
   AND cb.`timestamp` <  '2012-06-26 18:00'
 GROUP BY DATE_FORMAT(cb.`timestamp`,'%Y-%m-%d %H:00:00')
 ORDER BY DATE_FORMAT(cb.`timestamp`,'%Y-%m-%d %H:00:00')

This query will have "gaps" where there are no rows to be counted, that is, they won't return a count of zero.

That can be addressed by providing a row source that returns each value for "hour", and then performing a left join with the result set. In the following statement, the subquery aliased as h returns 24 rows, one for each hour. We use that as the driving row source for a left join against the "result" query (from above). Any place we don't get a match, we'll get a NULL for a count. And we can replace the NULL with a zero with a simple function call.

SELECT h.hour AS cb_hour
     , IFNULL(c.cb_count,0) AS cb_count
  FROM (SELECT DATE_ADD('2012-06-26 18:00',INTERVAL -1*d.i HOUR) AS `hour`
          FROM (SELECT 00 AS i UNION ALL SELECT 01 UNION ALL SELECT 02 UNION ALL SELECT 03 
                UNION ALL SELECT 04 UNION ALL SELECT 05 UNION ALL SELECT 06 UNION ALL SELECT 07 
                UNION ALL SELECT 08 UNION ALL SELECT 09 UNION ALL SELECT 10 UNION ALL SELECT 11 
                UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL SELECT 15 
                UNION ALL SELECT 16 UNION ALL SELECT 17 UNION ALL SELECT 18 UNION ALL SELECT 19 
                UNION ALL SELECT 20 UNION ALL SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23 
                ORDER BY 1 DESC
               ) d
       ) h
  LEFT
  JOIN (SELECT FROM_UNIXTIME((UNIX_TIMESTAMP(cb.`timestamp`) DIV 3600) * 3600) AS `cb_hour`
             , COUNT(1) AS cb_count
          FROM `c_blacklist` cb
         WHERE cb.`timestamp` >= DATE_ADD('2012-06-26 18:00',INTERVAL -1 DAY)
           AND cb.`timestamp` < '2012-06-26 18:00'
           AND cb.`reason` = 'hardbounce'
         GROUP BY FROM_UNIXTIME((UNIX_TIMESTAMP(cb.`timestamp`) DIV 3600) * 3600)
         ORDER BY FROM_UNIXTIME((UNIX_TIMESTAMP(cb.`timestamp`) DIV 3600) * 3600)
       ) c
    ON c.cb_hour = h.hour
 ORDER BY h.hour

Granted, that's a lot more query text than you currently have.

To get that into my code, I would replace the three occurrences of the date literals with a '%s', and use sprintf to replace the three occurrences with a formatted date string. (The same value gets passed for all three occurrences.)

Answer 3

Group by the hour value of the timestamp.

SELECT
    date_format(`timestamp`,'%H') day_hour,
    count(*) count
FROM
    `c_blacklist`
WHERE
    `timestamp` between $start and $end
    and `reason` = 'hardbounce'
GROUP BY
    date_format(`timestamp`,'%H')
ORDER BY
    1;

$result = mysql_query($query) or die(mysql_error());
foreach($row = mysql_fetch_array($result)) {
    $dataset5[] = array($row['day_hour'],$row['count'])
}

Answer 4

$query = "SELECT `reason`,`timestamp`,FROM_UNIXTIME(timestamp, '%H') as Hour
FROM `c_blacklist` 
WHERE `timestamp` > ('".strtotime('today')."'  + ".$hr_start.") AND (`timestamp` < '".strtotime('today')."' + ".$hr_stop.") AND `reason` = 'hardbounce'
GROUP BY FROM_UNIXTIME(timestamp, '%H')";

Added some ()'s for order of operation protection, but added a FROM_UNIXTIME('%H', timestamp) which will give you the hour assuming timestamp is a epoch/unix timestamp.

How to display hourly stats from mysql table

Question

4 answers

solution1
3 2012-06-26 19:05:37

solution2
2 2012-06-27 00:29:46

solution3
1 2012-06-26 19:10:51

solution4
0 2012-06-26 19:05:06

How to display hourly stats from mysql table

Question

4 answers

solution1 3 2012-06-26 19:05:37

solution2 2 2012-06-27 00:29:46

solution3 1 2012-06-26 19:10:51

solution4 0 2012-06-26 19:05:06

solution1
3 2012-06-26 19:05:37

solution2
2 2012-06-27 00:29:46

solution3
1 2012-06-26 19:10:51

solution4
0 2012-06-26 19:05:06