简体   繁体   中英

Restructure query to group by month/year

I have a database containing millions of rows of information tracking an order's progress through out systems. From end to end, an order may pass through anywhere from 2 - 20 systems. Each part of this journey will be recorded in the database, eg

ORDER ID        SOURCE        DESTINATION        TIMESTAMP
10               Sys 1          Sys 2            01-Jan-14
10               Sys 2          Sys 3            01-Jan-14
10               Sys 3          Sys 4            03-Jan-14
10               Sys 4          Sys 5            07-Jan-14

The timestamp records when the order left that system.

I have a query I wrote to determine the length of each order:

Select ORDERID, 1 + TRUNC(MAX(TIMESTAMP)) - TRUNC(MIN(TIMESTAMP))
from DATABASE GROUP BY ORDERID

This works fine and for the above order would produce 7 days. When I run this query on every tuple in the database, I get the average end to end order progress time for every single order in the database. I can then use all these individual totals to find out the overall average order time.

This is all good, but I'd like now to be able to break this down into individual month/year pairings, so I can effectively see if the average length of time in the system has increased or decreased in a given month.

I'm fairly inexperienced with SQL and I really have no idea where to start. How could I write a query that would track the start date of any order and see how long it stays in the system for, producing an overall total length of days in system per month/year combination ?

Sample Data

Currently, above query would produce a series of tuples like this:

Order Id    Days in System
0145240 1
10000   1
10001   1
10003   130
10004   3
10007   1
10008   13
10009   1
10010   1

I can then find the average of all this information. What I would really like though is to be able to do something like this:

ORDER ID        SOURCE        DESTINATION        TIMESTAMP
10               Sys 1          Sys 2            01-Jan-14
10               Sys 2          Sys 3            01-Jan-14
10               Sys 3          Sys 4            03-Jan-14
10               Sys 4          Sys 5            07-Jan-14
11               Sys 1          Sys 2            01-Feb-14
11               Sys 2          Sys 3            03-Mar-14
12               Sys 1          Sys 2            04-Mar-14           
12               Sys 2          Ssy 3            05-Mar-14
13               Sys 1          Sys 2            07-Mar-14
13               Sys 2          Sys 3            14-Mar-14

Imagine all the above are completed orders.

OrderID 10: Took 7 days to go from end to end.
OrderID 11: Took 31 days to go from end to end.
OrderID 12: Took 2 days to go from end to end.
OrderID 13: Took 8 days to go from end to end.

OrderId 10 was the only order in January, OrderID 11 was only order in February and OrderIDs 12 and 13 both took place in March. Therefore, ideally, the query I want to design would produce the following:

Jan 2014:    Average = 7
Feb 2014:    Average = 31
Mar 2014:    Average = 5 (i.e. (2 + 8) / 2)

On month wise basis

Select ORDERID, 
       to_char(to_date(Timestamp, 'DD-MM-YYYY'), 'Month'),
       1 + TRUNC(MAX(TIMESTAMP)) - TRUNC(MIN(TIMESTAMP)) as duration
from DATABASE GROUP BY ORDERID, to_char(to_date(Timestamp, 'DD-MM-YYYY'), 'Month')
Order By ORDERID,duration

Similarly you can extarct year from timestamp date column and group by orderid and year to track duration on yearly basis per order id.

You could look at analytic functions, but a fairly simple way would be to add the 'start' date (which is a little confusing, as it seems to be the timestamp when the order left the first system, not when it arrived there?):

select orderid, min(timestamp) as first_seen,
  1 + trunc(max(timestamp)) - trunc(min(timestamp)) as duration
from database
group by orderid
order by orderid;

With some additional data that might give you:

   ORDERID FIRST_SEEN                     DURATION
---------- ---------------------------- ----------
        10 01-JAN-14 09.00.00.000000000          7 
        11 01-JAN-14 09.00.00.000000000          2 
        12 31-JAN-14 09.00.00.000000000          3 
        13 01-FEB-14 09.00.00.000000000          2 

You can then use that as a subquery and average by grouping over the first date of the month of the 'first seen' date:

select trunc(first_seen, 'MM') as month,
  avg(duration) as duration
from (
  select orderid, min(timestamp) as first_seen,
    1 + trunc(max(timestamp)) - trunc(min(timestamp)) as duration
  from database group by orderid
)
group by trunc(first_seen, 'MM')
order by trunc(first_seen, 'MM');

MONTH       DURATION
--------- ----------
01-JAN-14          4 
01-FEB-14          2 

SQL Fiddle .

Calling a table 'database' is a bit confusing, as it's a keyword (though not reserved, so it's legal). And calling a column 'timestamp' is also a bit odd, particularly if it's actually a date rather than a timestamp - it isn't clear which your actual table has. But as you've changed the names for posting this is rather moot.

Or with your expanded sample data :

   ORDERID FIRST_SEEN                     DURATION
---------- ---------------------------- ----------
        10 01-JAN-14 00.00.00.000000000          7 
        11 01-FEB-14 00.00.00.000000000         31 
        12 04-MAR-14 00.00.00.000000000          2 
        13 07-MAR-14 00.00.00.000000000          8 

MONTH       DURATION
--------- ----------
01-JAN-14          7 
01-FEB-14         31 
01-MAR-14          5 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM