I am a SAS Developer. I have a sql to do group by for getting the min and max from a column called "CalculatedPower". Below is the structure that i got from a group by statement (lt_dt and lp_dt is what I want. I am putting it here but it is not in my actual table as i do not know how to achieve this) :
station datetime calculatedpower min_power max_power lt_dt lp_dt
ABBA 28AUG2018:0:0:0 100 1 100 01SEP2018:1:0:0 28AUG2018:0:0:0
ABBA 31AUG2018:12:0:0 88 1 100 01SEP2018:1:0:0 28AUG2018:0:0:0
ABBA 01SEP2018:1:0:0 1 1 100 01SEP2018:1:0:0 28AUG2018:0:0:0
ZZZZ 07SEP2018:0:0:0 900 900 3000 07SEP2018:0:0:0 21SEP2018:0:0:0
ZZZZ 09SEP2018:0:0:0 1000 900 3000 07SEP2018:0:0:0 21SEP2018:0:0:0
ZZZZ 21SEP2018:0:0:0 3000 900 3000 07SEP2018:0:0:0 21SEP2018:0:0:0
As you all can see, I aggregate them by Station and use Min and Max function to get the min_power and max_power. For now, I need to also get the min datetime(into lt_dt) and max datetime(into lp_dt). I will be expecting like below: ABBA lt_dt is 01SEP2018:1:0:0 while lp_dt is 28AUG2018:0:0:0
Meaning to say, lp_dt(datetime based on max_power) while lt_dt(datetime based on min power)
My group by statement is as below:
proc sql;
select
station
,datetime
,calculatedpower
,min(calculatedpower) as lt_calculatedpower
,max(calculatedpower) as lp_calculatedpower
from sumall
group by
station
;
quit;
Is there a way to tweak my existing SQL statement to achieve the datetime that i want? I tried an additional SQL statement like below(but it is taking forever to process 600k data, not sure if it works or not as it is still running)
proc sql;
select *,
case when calculatedpower=lt_calculatedpower then datetime end as lt_datetime
from minmax;
quit;
With this code, i foresee there will be issue if there are few rows with the same calculated power but different datetime that ties to 1 station.
In SQL you will need to use a sub-select that contains a case statement that identifies the date at which the min and max occur. The sub-select is joined to the original table.
Note: SAS SQL will automatically rejoin summary (aggregate function) results when appropriate.
Example
In this example level1_id
is for station
, level2_seq
is for datetime
and x
is for calculatedpower
.
data have;
do level1_id = 1 to 5;
do level2_seq = 1 to 5;
x = floor(100*ranuni(123));
output;
end;
end;
run;
proc sql;
create table want as
select
have.*
, min(have.x) as min_x
, max(have.x) as max_x
, min(at.min_at) as min_x_first_at_seq
, min(at.max_at) as max_x_first_at_seq
from
have
left join
(
select inside.level1_id, inside.level2_seq
, case when inside.x = min(inside.x) then inside.level2_seq else . end as min_at
, case when inside.x = max(inside.x) then inside.level2_seq else . end as max_at
from have inside
group by inside.level1_id
) at
on
have.level1_id = at.level1_id and
have.level2_seq = at.level2_seq
group by
have.level1_id
order by
have.level1_id, level2_seq
;
Here's the SAS PROC SUMMARY and a DATA STEP merge to get your final desired output.
Use the MAXID, MINID options on the OUTPUT statement to get the ID of the max and ID of the minimum values.
The first part of the solution generates your fake data - please provide data in that format in the future. Then the PROC SUMMARY calculates the statistics and you can merge it in. This should complete really quickly on your system, as in less than a minute.
data have;
input station $ datetime anydtdtm. calculatedpower ;
format datetime datetime.;
cards;
ABBA 28AUG2018:0:0:0 100
ABBA 31AUG2018:12:0:0 88
ABBA 01SEP2018:1:0:0 1
ZZZZ 07SEP2018:0:0:0 900
ZZZZ 09SEP2018:0:0:0 1000
ZZZZ 21SEP2018:0:0:0 3000
;;;;
run;
proc summary data=have nway;
class station;
id datetime;
var calculatedPower;
output out=summary min=Min_power max=max_power minid=min_date maxid=max_Date;
run;
data final;
merge have summary;
by station;
run;
EDIT: removed autoname option, since I explicitly named the output variables.
EDIT2: When multiple observations contain the same extreme values in all the MIN or MAX variables, PROC MEANS uses the observation number to resolve which observation to write to the output. By default, PROC MEANS uses the first observation to resolve any ties. However, if you specify the LAST option, then PROC MEANS uses the last observation to resolve any ties.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.