简体   繁体   English

如何从最小和最大组中获取日期?

[英]How do i get the date from a min and max group by?

I am a SAS Developer. 我是SAS开发人员。 I have a sql to do group by for getting the min and max from a column called "CalculatedPower". 我有一个sql做分组,从名为“CalculatedPower”的列中获取最小值和最大值。 Below is the structure that i got from a group by statement (lt_dt and lp_dt is what I want. I am putting it here but it is not in my actual table as i do not know how to achieve this) : 下面是我从group by语句得到的结构(lt_dt和lp_dt是我想要的。我把它放在这里,但它不在我的实际表中,因为我不知道如何实现这一点):

station datetime        calculatedpower min_power   max_power   lt_dt               lp_dt
ABBA    28AUG2018:0:0:0     100         1            100        01SEP2018:1:0:0     28AUG2018:0:0:0
ABBA    31AUG2018:12:0:0    88          1            100        01SEP2018:1:0:0     28AUG2018:0:0:0
ABBA    01SEP2018:1:0:0     1           1            100        01SEP2018:1:0:0     28AUG2018:0:0:0
ZZZZ    07SEP2018:0:0:0     900         900          3000       07SEP2018:0:0:0     21SEP2018:0:0:0
ZZZZ    09SEP2018:0:0:0     1000        900          3000       07SEP2018:0:0:0     21SEP2018:0:0:0
ZZZZ    21SEP2018:0:0:0     3000        900          3000       07SEP2018:0:0:0     21SEP2018:0:0:0

As you all can see, I aggregate them by Station and use Min and Max function to get the min_power and max_power. 正如大家所看到的,我通过Station聚合它们并使用Min和Max函数来获取min_power和max_power。 For now, I need to also get the min datetime(into lt_dt) and max datetime(into lp_dt). 现在,我还需要获取最小日期时间(到lt_dt)和最大日期时间(到lp_dt)。 I will be expecting like below: ABBA lt_dt is 01SEP2018:1:0:0 while lp_dt is 28AUG2018:0:0:0 我将期待如下:ABBA lt_dt是01SEP2018:1:0:0而lp_dt是28AUG2018:0:0:0

Meaning to say, lp_dt(datetime based on max_power) while lt_dt(datetime based on min power) 意思是说,lp_dt(基于max_power的日期时间)而lt_dt(基于最小功率的日期时间)

My group by statement is as below: 我的分组声明如下:

proc sql;
select 
station
,datetime
,calculatedpower
,min(calculatedpower) as lt_calculatedpower
,max(calculatedpower) as lp_calculatedpower
from sumall
group by 
station
;
quit;

Is there a way to tweak my existing SQL statement to achieve the datetime that i want? 有没有办法调整我现有的SQL语句来实现我想要的日期时间? I tried an additional SQL statement like below(but it is taking forever to process 600k data, not sure if it works or not as it is still running) 我尝试了一个额外的SQL语句,如下所示(但它需要永远处理600k数据,不确定它是否有效,因为它仍在运行)

proc sql;
select *,
case when calculatedpower=lt_calculatedpower then datetime end as lt_datetime
from minmax;
quit;

With this code, i foresee there will be issue if there are few rows with the same calculated power but different datetime that ties to 1 station. 使用此代码,我预见如果几行具有相同的计算功率但是与1个站关联的日期时间不同,则会出现问题。

In SQL you will need to use a sub-select that contains a case statement that identifies the date at which the min and max occur. 在SQL中,您将需要使用包含case语句的子选择,该语句标识min和max发生的日期。 The sub-select is joined to the original table. 子选择连接到原始表。

Note: SAS SQL will automatically rejoin summary (aggregate function) results when appropriate. 注意:SAS SQL将在适当时自动重新加入摘要(聚合函数)结果。

Example

In this example level1_id is for station , level2_seq is for datetime and x is for calculatedpower . 在这个例子中level1_idstationlevel2_seqdatetimexcalculatedpower

data have;
  do level1_id = 1 to 5;
    do level2_seq = 1 to 5;
      x = floor(100*ranuni(123));
      output;
    end;
  end;
run;

proc sql;
  create table want as
  select 
    have.*
    , min(have.x) as min_x
    , max(have.x) as max_x
    , min(at.min_at) as min_x_first_at_seq
    , min(at.max_at) as max_x_first_at_seq
  from 
    have
  left join 
  (
    select inside.level1_id, inside.level2_seq
    , case when inside.x = min(inside.x) then inside.level2_seq else . end as min_at
    , case when inside.x = max(inside.x) then inside.level2_seq else . end as max_at
    from have inside
    group by inside.level1_id
  ) at
  on
    have.level1_id = at.level1_id and
    have.level2_seq = at.level2_seq
  group by
    have.level1_id
  order by
    have.level1_id, level2_seq
  ;

Here's the SAS PROC SUMMARY and a DATA STEP merge to get your final desired output. 这是SAS PROC SUMMARY和DATA STEP合并以获得您最终所需的输出。

Use the MAXID, MINID options on the OUTPUT statement to get the ID of the max and ID of the minimum values. 使用OUTPUT语句中的MAXID,MINID选项获取最大值的ID和最小值的ID。

The first part of the solution generates your fake data - please provide data in that format in the future. 解决方案的第一部分会生成您的虚假数据 - 请在将来提供该格式的数据。 Then the PROC SUMMARY calculates the statistics and you can merge it in. This should complete really quickly on your system, as in less than a minute. 然后PROC SUMMARY计算统计数据,您可以将其合并。这应该在您的系统上很快完成,就像在不到一分钟的时间内完成的那样。

data have;
input station $ datetime  anydtdtm.      calculatedpower ;
format datetime datetime.;
cards;
ABBA    28AUG2018:0:0:0     100         
ABBA    31AUG2018:12:0:0    88          
ABBA    01SEP2018:1:0:0     1           
ZZZZ    07SEP2018:0:0:0     900         
ZZZZ    09SEP2018:0:0:0     1000        
ZZZZ    21SEP2018:0:0:0     3000        
;;;;
run;

proc summary data=have nway;
class station;
id datetime;
var calculatedPower;
output out=summary min=Min_power max=max_power minid=min_date  maxid=max_Date;
run;

data final;
merge have summary;
by station;
run;

EDIT: removed autoname option, since I explicitly named the output variables. 编辑:删除了自动命名选项,因为我明确命名了输出变量。

EDIT2: When multiple observations contain the same extreme values in all the MIN or MAX variables, PROC MEANS uses the observation number to resolve which observation to write to the output. EDIT2:当多个观测值在所有MIN或MAX变量中包含相同的极值时,PROC MEANS使用观测值来解析要写入输出的观测值。 By default, PROC MEANS uses the first observation to resolve any ties. 默认情况下,PROC MEANS使用第一个观察来解决任何关系。 However, if you specify the LAST option, then PROC MEANS uses the last observation to resolve any ties. 但是,如果指定LAST选项,则PROC MEANS使用最后一个观察来解决任何关系。

https://documentation.sas.com/?docsetId=proc&docsetTarget=p04vbvpcjg2vrjn1v8wyf0daypfi.htm&docsetVersion=9.4&locale=en#p1p58yhxlrc0can1scam7bco7y96 https://documentation.sas.com/?docsetId=proc&docsetTarget=p04vbvpcjg2vrjn1v8wyf0daypfi.htm&docsetVersion=9.4&locale=en#p1p58yhxlrc0can1scam7bco7y96

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM