I'm trying to write an sql query (proc sql) to get data from example table:
order_id base_order_id customer_id ========================================================== 1 null 1 //only one transaction ------------------------------------------------------------------------------- 2 null 1 //order_start 3 2 1 4 3 1 5 4 1 6 5 1 7 6 1 //order_end -------------------------------------------------------------------------------
as follows:
order_id last_order_id customer_id 1 null 1 2 7 1
Let me put it this way. Order_id 2 has 6 subprocesses. We can assume that complete order for that one Client consists of order_id from 2 to 7. Order start = order_id 2 as whole order ends at order_id 7.
I'm begginer in sas sql. I've tried joining the same table via left join, "having" clause but nothing worked well. Is there any way to get a query result as in table 2?
Thank you in advance.
EDIT2. SQL I wrote, that brings closest result.
SELECT t1.order_id, t1.base_order_id as last_order_id, t1.customer_id
FROM table1 t1
GROUP BY t1.order_id
HAVING (t1.order_id = max(t1.base_order_id)
or t1.base_order_id IS NULL)
So as Gordon says in the comments, I cannot think of a way to do this in PROC SQL.
However a "SAS" way to do this would be a connected component analysis. PROC OPTNET in SAS/OR does just this.
data have;
input order_id base_order_id customer_id;
datalines;
1 . 1
2 . 1
3 2 1
4 3 1
5 4 1
6 5 1
7 6 1
;
run;
/*Connect the first order to itself*/
data have;
set have;
if base_order_id = . then base_order_id = order_id;
run;
/*Use SAS/OR and connected components*/
proc optnet
data_links = have(rename=(order_id = to base_order_id = from))
out_nodes = out;
concomp;
run;
/*Summarize and add customer id*/
proc sql noprint;
create table want as
select a.order_id,
a.last_order_id,
b.customer_id
from (
select min(node) as order_id,
max(node) as last_order_id
from out
group by concomp
) as a
left join
test as b
on a.order_id = b.order_id;
quit;
This returns what you are looking for in the WANT dataset.
The only way I am aware of, requires that you add 2 new columns to the data you are querying against. A nice explanation of it can be found here:
http://www.sitepoint.com/hierarchical-data-database-2/
I don't have the time right now to transcribe that and put it all into an SO answer. I will modify this answer later with some code that will add the 2 new columns to your example dataset. IMO, this is the hardest part anyway.
Some nice things about this approach:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.