简体   繁体   中英

select statement for averages based on different date ranges in one MySQL query

Basically I am attempting to make a chart with this data. I am able to put my query into a while loop in PHP to get each average, but I would prefer this was done with one query producing one result table.

<?php 

date_default_timezone_set('America/Los_Angeles');

include('../connect.php');

$subcategory = 'T-Shirts';

$date = date('Y-m-d', strtotime('-29 days'));
$today = date("Y-m-d");

$subcategory = mysqli_real_escape_string($conp, $subcategory);

echo "<table border=\"1\">";
echo "<tr>";
echo "<th>date</th>";
echo "<th>average</th>";
echo "</tr>";

while (strtotime($date) <= strtotime($today)) {

    $from_date = date ("Y-m-d", strtotime("-29 day", strtotime($date)));

    $query = $conp->query("SELECT ROUND(SUM(OutCount)/30) AS 'average' FROM inventory
    LEFT JOIN item
    ON inventory.itemcode = item.itemcode
    WHERE item.subcategory = '$subcategory'
    AND TrDateTime BETWEEN '$from_date' AND '$date' AND transactiontype like 'OUT_%'"); 

    if($query->num_rows){       
        while($row = mysqli_fetch_array($query, MYSQL_ASSOC)){                      
            if(!empty($row['average'])){
                $average = $row['average'];
            }else{
                $average = "N/A";
            }
        }                       
        mysqli_free_result($query);                             
    }else{
        $average = "N/A";
    }

    $date = date ("Y-m-d", strtotime("+1 day", strtotime($date)));

    echo "<tr>";
    echo "<td>" . $date . "</td>";
    echo "<td>" . $average . "</td>";
    echo "</tr>";
}

echo "</table>";

?>

I get all the dates in the past 30 days (including today) and the average sales from a range of 29 days prior until that date.

+------------+----------+  
| date       | average  |  
+------------+----------+  
| 2015-04-09 | 222      |  
| 2015-04-10 | 225      |  
| 2015-04-11 | 219      |  
| ...        | ...      |  
+------------+----------+  

I am able to get everything I need this way, but it is running 29 queries in this situation and MySQL would be substantially quicker. I started to come up with a MySQL procedure, but I am not sure how well this will work when I try and call it with PHP.

DELIMITER //
    CREATE PROCEDURE average_daily_sales()
    BEGIN

        SET @today = CURDATE();
        SET @date_var = CURDATE() - INTERVAL 29 DAY;
        SET @from_date = @date_var - INTERVAL 29 DAY;
        SET @to_date = @from_date + INTERVAL 29 DAY;

        label1: WHILE @date_var < @today DO

            SELECT      DATE_FORMAT(trdatetime, '%Y-%m-%d') as 'date', ROUND(SUM(OutCount)/30) AS 'average'
            FROM        inventory
            LEFT JOIN   item
            ON          inventory.itemcode = item.itemcode
            WHERE       item.subcategory = 'T-Shirts'
            AND         trdatetime BETWEEN @from_date - INTERVAL 29 DAY AND @to_date
            AND         transactiontype like 'OUT_%';

            SET @date_var = @date_var + INTERVAL 1 DAY;

        END WHILE label1;    

    END; //
DELIMITER ;

Ultimately, I would prefer a regular MySQL statement that I can use to produce the desired result table in one shot. Any help would be greatly appreciated.

If you create a calender table and populate that with a range of date values, eg

CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) ;
INSERT INTO cal VALUES ('2015-04-01'),('2015-04-02'),('2015-04-03'), ... ;

you could use that as a row source, in a query like this:

SELECT cal.dt
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foo'
            AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
            AND n.TrDateTime <  cal.dt + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM cal
 WHERE cal.dt >= '2015-04-01'
   AND cal.dt <  '2015-05-01'
 ORDER BY cal.dt

It's not mandatory to create a cal calendar table. We could use an inline view and give it an alias of cal . For example, in the query above, we could replace this line:

  FROM cal

with this:

  FROM ( SELECT DATE('2015-04-01') AS dt
         UNION ALL SELECT DATE('2015-04-02')
         UNION ALL SELECT DATE('2015-04-03')
         UNION ALL SELECT DATE('2015-04-04')
         UNION ALL SELECT DATE('2015-04-05')
       ) cal

Or, if you have a rowsource that can give you a contiguous series of integers, starting at zero up t you could manufacture your date values from a base date, for example

   FROM ( SELECT '2014-04-01' + INTERVAL i.n DAY
            FROM source_of_integers i
           WHERE i.n >= 0
             AND i.n < 31
           ORDER BY i.n
        ) cal

Some notes:

The original query shows an outer ( LEFT ) join, but the equality predicate in the WHERE clause negates the "outerness" of the join, it's equivalent to an inner join.

Some of the column references in the query are not qualified. Best practice is to qualify all column references, then the reader can understand which columns are coming from which tables, without requiring the reader to be familiar with which columns are in which tables. This also protects the statement from breaking in the future (with an "ambiguous column" error) when a column that has the same name is added to another table referenced in the query.)

FOLLOWUP

Personally, for a limited number of date values, I'd go with the inline view that doesn't reference a table. I'd have the PHP code generate that query for me.

With a starting date, say it's '2015-04-10', I'd take that date value and format it into a query, equivalent doing this:

$cal = "SELECT DATE('2015-04-10') AS dt" ;

Then I'd spin through a loop, and increment that date value by 1 day. Each time through the loop, I'd appending to $cal a select of the next date, the net effect of running through the loop three times would be equivalent to doing this:

$cal .= " UNION ALL SELECT DATE('2015-04-11')";
$cal .= " UNION ALL SELECT DATE('2015-04-12')";
$cal .= " UNION ALL SELECT DATE('2015-04-13')";

As a less attractive alternative, we could keep repeating the same value of the start date, and just increment an integer value, and let MySQL do the date math for us.

$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 1 DAY";
$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 2 DAY";
$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 3 DAY";

Then, I'd just slide the $cal query into the SQL text as an inline view query. Something like this:

$sql = "SELECT cal.dt
             , ( SELECT IFNULL(ROUND(SUM
                 ,0) AS average_
          FROM ( " . $cal . " ) cal
          LEFT
          JOIN item ON ... ";

Anyway, that's the approach I'd take if this was for a limited number of date values (a couple dozen or so), and if I was only going to be running this query occasionally, not hammering the database server with this query repeatedly, for every request.) If I was going to pound the server, I'd create and maintain a real cal table, rather than incur the overhead of materializing a derived table on every query.

Do you have data on each distinct day in the range? If so, this is a slightly complex join operation, but very doable.

You can get the date ranges you need as follows:

        SELECT DISTINCT
               DATE(trdatetime)- INTERVAL 30 DAY AS startdate,
               DATE(trdatetime)                  AS enddateplus1
          FROM inventory
         WHERE trdatetime >= NOW() - INTERVAL 31 DAY

Debug this query. Take a look to make sure you get each date range you want.

Then you can join this to your business query like so

  SELECT dates.startdate, 
         ROUND(SUM(OutCount)/30) AS 'average'
   FROM (
        SELECT DISTINCT
               DATE(trdatetime)- INTERVAL 30 DAY AS startdate,
               DATE(trdatetime)                  AS enddateplus1
          FROM inventory
         WHERE trdatetime >= NOW() - INTERVAL 31 DAY
        ) dates
   LEFT JOIN inventory  ON i.trdatetime >= dates.startdate
                       AND i.trdatetime <  dates.enddateplus1 
   LEFT JOIN  item ON  i.itemcode = item.itemcode
  WHERE item.subcategory = 'T-Shirts'
    AND transactiontype like 'OUT_%'
  GROUP BY dates.startdate

If your inventory data is sparse, that is, you don't have transactions on all days, then your dates query will be missing some rows.

There's a way to fill in those missing rows. But it's a pain in the s . Read this for more info. http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

Notice that BETWEEN works very badly indeed for filtering DATETIME or TIMESTAMP values.

The suggestions from @OllieJones and @spencer7593 either required a 'transaction' to take place every day in order to utilize SELECT DISTINCT DATE(trdatetime) , you needed to create another table, or you needed to generate a derived table.

SELECT DISTINCT DATE(trdatetime) wasn't an option for me because I did not have transactions for everyday.

The hybrid PHP and MySQL example that @spencer7593 suggested would generate a derived table very well. In the end it took the static version about 1.8 seconds to get a result. The issue being that you would need additional PHP to generate this... (see @spencer7593 answer)

SELECT cal.dt
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foo'
            AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
            AND n.TrDateTime <  cal.dt + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM ( SELECT DATE('2015-04-01') AS dt
        UNION ALL SELECT DATE('2015-04-02')
        UNION ALL SELECT DATE('2015-04-03')
        UNION ALL SELECT DATE('2015-04-04')
        UNION ALL SELECT DATE('2015-04-05')
        UNION ALL SELECT DATE('2015-04-06')
etc...
       ) cal
 WHERE cal.dt >= '2015-04-01'
   AND cal.dt <  '2015-05-01'
 ORDER BY cal.dt

I am attempted to use another one of @spencer7593 answers. I created a "source of integers" table with the numbers 0-31 as he suggested. This method took a little over 1.8 seconds.

SELECT cal.sd, cal.ed
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foobar'
            AND n.TrDateTime >= cal.ed + INTERVAL -30 DAY
            AND n.TrDateTime <  cal.ed + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM ( SELECT (CURDATE() + INTERVAL -30 DAY) + INTERVAL i.n DAY as `ed`, (((CURDATE() + INTERVAL -30 DAY) + INTERVAL i.n DAY) + INTERVAL - 30 DAY) as `sd`
            FROM source_of_integers i
           WHERE i.n >= 0
             AND i.n < 31
           ORDER BY i.n
        ) cal
WHERE cal.ed >= CURDATE() + INTERVAL -29 DAY
   AND cal.ed <=  CURDATE()
 ORDER BY cal.ed;

You need a rowsource for these dates, there isn't really a way around that. In the end I made a cal table..

CREATE TABLE cal (
    dt DATE NOT NULL PRIMARY KEY
);

CREATE TABLE ints ( i tinyint );

INSERT INTO ints VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

INSERT INTO cal (dt)
SELECT DATE('2010-01-01') + INTERVAL a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i DAY
FROM ints a JOIN ints b JOIN ints c JOIN ints d JOIN ints e
WHERE (a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i) <= 3651
ORDER BY 1;

And then ran a slightly modified version of @spencer7593 answer on it..

SELECT cal.dt
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foo'
            AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
            AND n.TrDateTime <  cal.dt + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM cal
WHERE cal.dt >= CURDATE() + INTERVAL -30 DAY
    AND cal.dt <  CURDATE()
ORDER BY cal.dt;

In my opinion, I believe this is the cleanest (less PHP) and highest performing answer.

Here is how I indexed the inventory table to speed it up substantially:

ALTER TABLE inventory ADD KEY (ItemCode, TrDateTime, TransactionType);

Thank you @OllieJones and @spencer7593 for all of your help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM