简体   繁体   中英

How do i get the date from a min and max group by?

I am a SAS Developer. I have a sql to do group by for getting the min and max from a column called "CalculatedPower". Below is the structure that i got from a group by statement (lt_dt and lp_dt is what I want. I am putting it here but it is not in my actual table as i do not know how to achieve this) :

station datetime        calculatedpower min_power   max_power   lt_dt               lp_dt
ABBA    28AUG2018:0:0:0     100         1            100        01SEP2018:1:0:0     28AUG2018:0:0:0
ABBA    31AUG2018:12:0:0    88          1            100        01SEP2018:1:0:0     28AUG2018:0:0:0
ABBA    01SEP2018:1:0:0     1           1            100        01SEP2018:1:0:0     28AUG2018:0:0:0
ZZZZ    07SEP2018:0:0:0     900         900          3000       07SEP2018:0:0:0     21SEP2018:0:0:0
ZZZZ    09SEP2018:0:0:0     1000        900          3000       07SEP2018:0:0:0     21SEP2018:0:0:0
ZZZZ    21SEP2018:0:0:0     3000        900          3000       07SEP2018:0:0:0     21SEP2018:0:0:0

As you all can see, I aggregate them by Station and use Min and Max function to get the min_power and max_power. For now, I need to also get the min datetime(into lt_dt) and max datetime(into lp_dt). I will be expecting like below: ABBA lt_dt is 01SEP2018:1:0:0 while lp_dt is 28AUG2018:0:0:0

Meaning to say, lp_dt(datetime based on max_power) while lt_dt(datetime based on min power)

My group by statement is as below:

proc sql;
select 
station
,datetime
,calculatedpower
,min(calculatedpower) as lt_calculatedpower
,max(calculatedpower) as lp_calculatedpower
from sumall
group by 
station
;
quit;

Is there a way to tweak my existing SQL statement to achieve the datetime that i want? I tried an additional SQL statement like below(but it is taking forever to process 600k data, not sure if it works or not as it is still running)

proc sql;
select *,
case when calculatedpower=lt_calculatedpower then datetime end as lt_datetime
from minmax;
quit;

With this code, i foresee there will be issue if there are few rows with the same calculated power but different datetime that ties to 1 station.

In SQL you will need to use a sub-select that contains a case statement that identifies the date at which the min and max occur. The sub-select is joined to the original table.

Note: SAS SQL will automatically rejoin summary (aggregate function) results when appropriate.

Example

In this example level1_id is for station , level2_seq is for datetime and x is for calculatedpower .

data have;
  do level1_id = 1 to 5;
    do level2_seq = 1 to 5;
      x = floor(100*ranuni(123));
      output;
    end;
  end;
run;

proc sql;
  create table want as
  select 
    have.*
    , min(have.x) as min_x
    , max(have.x) as max_x
    , min(at.min_at) as min_x_first_at_seq
    , min(at.max_at) as max_x_first_at_seq
  from 
    have
  left join 
  (
    select inside.level1_id, inside.level2_seq
    , case when inside.x = min(inside.x) then inside.level2_seq else . end as min_at
    , case when inside.x = max(inside.x) then inside.level2_seq else . end as max_at
    from have inside
    group by inside.level1_id
  ) at
  on
    have.level1_id = at.level1_id and
    have.level2_seq = at.level2_seq
  group by
    have.level1_id
  order by
    have.level1_id, level2_seq
  ;

Here's the SAS PROC SUMMARY and a DATA STEP merge to get your final desired output.

Use the MAXID, MINID options on the OUTPUT statement to get the ID of the max and ID of the minimum values.

The first part of the solution generates your fake data - please provide data in that format in the future. Then the PROC SUMMARY calculates the statistics and you can merge it in. This should complete really quickly on your system, as in less than a minute.

data have;
input station $ datetime  anydtdtm.      calculatedpower ;
format datetime datetime.;
cards;
ABBA    28AUG2018:0:0:0     100         
ABBA    31AUG2018:12:0:0    88          
ABBA    01SEP2018:1:0:0     1           
ZZZZ    07SEP2018:0:0:0     900         
ZZZZ    09SEP2018:0:0:0     1000        
ZZZZ    21SEP2018:0:0:0     3000        
;;;;
run;

proc summary data=have nway;
class station;
id datetime;
var calculatedPower;
output out=summary min=Min_power max=max_power minid=min_date  maxid=max_Date;
run;

data final;
merge have summary;
by station;
run;

EDIT: removed autoname option, since I explicitly named the output variables.

EDIT2: When multiple observations contain the same extreme values in all the MIN or MAX variables, PROC MEANS uses the observation number to resolve which observation to write to the output. By default, PROC MEANS uses the first observation to resolve any ties. However, if you specify the LAST option, then PROC MEANS uses the last observation to resolve any ties.

https://documentation.sas.com/?docsetId=proc&docsetTarget=p04vbvpcjg2vrjn1v8wyf0daypfi.htm&docsetVersion=9.4&locale=en#p1p58yhxlrc0can1scam7bco7y96

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM