简体   繁体   中英

SAS SQL hierarchic query

I'm trying to write an sql query (proc sql) to get data from example table:

order_id            base_order_id           customer_id
==========================================================
1                       null                            1  //only one transaction
-------------------------------------------------------------------------------
2                       null                            1             //order_start
3                       2                               1
4                       3                               1             
5                       4                               1
6                       5                               1
7                       6                               1             //order_end
-------------------------------------------------------------------------------

as follows:

order_id            last_order_id           customer_id 
1                       null                            1                   
2                       7                               1

Let me put it this way. Order_id 2 has 6 subprocesses. We can assume that complete order for that one Client consists of order_id from 2 to 7. Order start = order_id 2 as whole order ends at order_id 7.

I'm begginer in sas sql. I've tried joining the same table via left join, "having" clause but nothing worked well. Is there any way to get a query result as in table 2?

Thank you in advance.

EDIT2. SQL I wrote, that brings closest result.

SELECT t1.order_id, t1.base_order_id as last_order_id, t1.customer_id

FROM table1 t1

GROUP BY t1.order_id
     HAVING (t1.order_id = max(t1.base_order_id) 
     or t1.base_order_id IS NULL)

So as Gordon says in the comments, I cannot think of a way to do this in PROC SQL.

However a "SAS" way to do this would be a connected component analysis. PROC OPTNET in SAS/OR does just this.

data have;
input order_id base_order_id customer_id;
datalines;
1 . 1
2 . 1
3 2 1
4 3 1
5 4 1
6 5 1
7 6 1
;
run;

/*Connect the first order to itself*/
data have;
set have;
if base_order_id = . then base_order_id = order_id;
run;

/*Use SAS/OR and connected components*/
proc optnet
    data_links = have(rename=(order_id = to base_order_id = from))
    out_nodes = out;
    concomp;
run;

/*Summarize and add customer id*/
proc sql noprint;
create table want as 
select a.order_id,
       a.last_order_id,
       b.customer_id
    from (
        select min(node) as order_id,
               max(node) as last_order_id
            from out
            group by concomp
    ) as a
      left join
    test as b
      on a.order_id = b.order_id;
quit;

This returns what you are looking for in the WANT dataset.

The only way I am aware of, requires that you add 2 new columns to the data you are querying against. A nice explanation of it can be found here:

http://www.sitepoint.com/hierarchical-data-database-2/

I don't have the time right now to transcribe that and put it all into an SO answer. I will modify this answer later with some code that will add the 2 new columns to your example dataset. IMO, this is the hardest part anyway.

Some nice things about this approach:

  1. It is not recursive - it allows for any SQL query to traverse the entire hierachy in a single pass.
  2. It supports indexing so if you have a lot of data your queries will run faster.
  3. It is pretty easy to understand/query against. Simple queries can return powerful results. I've used this approach in a complex supply chain environment to identify future bottlenecks with a single simple SQL query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM