SQL and PHP aggregating statistics, performance issue using COUNT hundreds of times

Question

I've got a PHP page doing 1000 SQL queries. It is giving statistics on events that have occured for a list of users. The page takes a bit long to load (6 seconds now with indexes tweaked). I want to know if there is another/better way to do this than 1000 individual queries. And is there a faster way, particularly as the data grows.

The results of those 1000 SQL queries are placed into PHP arrays and eventually populate cells of an html table, like so:

         Installs    Called    Early Install   Event4   Event5    (... 9
George     5           6          3              5        29      different event
Greg       9           7          1              8        23      types, up to
David      4           1          2              4        0       maybe 15
Dan        15          17         4              20       10      eventually)
...        ...         ...        ...            ...      ...
...        ...         ...        ...            ...      ...
Totals     351         312        82             289      1220

(... there are up to ~50 users, maybe 100 total in the next two years)

Some columns are actually percentages that are calculated on the fly in PHP from the data like (event4/installs)*100.

The table is always over a given data range like:
Choose date range: Dates Jan 15, 2013 - March 31, 2013 .

event table's fields: id, event_type, user_id, event_date

The data itself is stored as a table made up of events that occur on certain dates. The most frequent type of SQL statement the PHP page is firing are count queries that look like:

SELECT COUNT(id)
FROM events
WHERE userid = 10
    AND `event_date` BETWEEN '2013-01-01' AND '2013-02-15'
    AND event_type = 'Install';

SELECT COUNT(id)
FROM events
WHERE userid = 10
    AND `event_date` BETWEEN '2013-01-01' AND '2013-02-15'
    AND event_type = 'Called';

SELECT COUNT(id)
FROM events
WHERE userid = 10
    AND `event_date` BETWEEN '2013-01-01' AND '2013-02-15'
    AND event_type = 'Early Install';

/* and so on for each event type and user id */

These counts() populate the cells of the html table. It does these counts() in a php loop that goes over each user (representing each row in the html output table) and within each row it goes over each event type (columns) and does a COUNT for each one. ~50 users, ~10 event types, you get around ~1000 individual SQL requests on one page.

Is there a reasonable way to combine all these individual SQL COUNT operations or to do this all faster or more correctly without all of the individual COUNT calls coming from PHP? Maybe a stored procedure... does that make sense? and if so how to approach (bunch of count queries or cursor or what)? and how to construct/return back rows of calculated counts data from a stored procedure?

I guess I'm wanting to know, is this the "right way"®?

I'm not asking for an answer to the whole question necessarily just answers to a part that you might be able to answer, or how you'd approach.

Also (#2) How might this stuff be cached? Cached by bringing all the COUNT values to PHP and then writing out those values from PHP to a mysql table with a row for each user and each date range, or cached somewhere/somehow else?

Answer 1

Grouping comes to mind.

SELECT userid, event_type, COUNT(id) AS cnt
FROM events
WHERE `event_date` BETWEEN '2013-01-01' AND '2013-02-15'
GROUP BY userid, event_type
ORDER BY userid, event_type

This would return an array where each row roughly has the structure of:

array(
    userid=>10,
    event_type=>'Installs',
    cnt=>5
);

And you can iterate over that to build your table.

//iterate over the data first constructing a new array for below
$newData = array();
$headers = array();

foreach($data as $row){
    //save the data in a multi dimensional array under the userid
    if(!isset($newData[$row['userid']])){
        $newData[$row['userid']]=array();
    }
    $newData[$row['userid']][$row['event_type']] = $row['cnt'];
    $headers[$row['event_type']]=1;
}
//get the headers
$headers = array_keys($headers);

//display the data for debugging
echo '<pre>'.print_r($newData,1).'</pre>';

echo "<table colspan=0 cellspacing=0 border=1>\n";
//add "user id" to the headers
array_unshift($headers, "User ID");
//echo the headers
echo "\t<thead>\n\t\t<th>".implode("</th>\n\t\t<th>", $headers)."</th>\n\t</thead>\n";
//remove the user id column from headers
array_shift($headers);

echo "\t<tbody>\n";
//now loop over the new data and display.
foreach($newData as $userID=>$row){
    //start row
    echo "\t\t<tr>\n";
    //user id
    echo "\t\t\t<td>{$userID}</td>\n";
    //loop over the headers. there should be corresponding keys for each header
    foreach($header as $key){
        //get the count if the key exists and '-' if not.
        $cnt = isset($row[$key])?$row[$key]:'-';
        echo "\t\t\t<td>{$cnt}</td>\n";
    }
    echo "\t\t</tr>\n";
}
echo "\t</tbody>\n</table>\n";

Answer 2

Something like this should do it.

SELECT 
  userid,
  event_type,
  COUNT(id)
FROM 
  events
WHERE 
  `event_date` BETWEEN '2013-01-01' AND '2013-02-15'
GROUP BY 1, 2

EDIT: This is only a partial answer. I'm not really an authority on caching :) Sorry can't help that part.

SQL and PHP aggregating statistics, performance issue using COUNT hundreds of times

Question

2 answers

solution1
1 ACCPTED 2013-02-28 19:23:37

solution2
0 2013-02-28 19:21:48

SQL and PHP aggregating statistics, performance issue using COUNT hundreds of times

Question

2 answers

solution1 1 ACCPTED 2013-02-28 19:23:37

solution2 0 2013-02-28 19:21:48

solution1
1 ACCPTED 2013-02-28 19:23:37

solution2
0 2013-02-28 19:21:48