简体   繁体   English

按条件将多行数据转换为单行

[英]Multiple Rows of Data to Single Row by Conditions

I have data that looks like this (the last column added): 我有看起来像这样的数据(添加的最后一列):

ID      Var 1       Date    What I Want
aa11     Stage I    1980    Delete
aa11     Stage 2    1980    Keep
aa22     Stage 1    1980    Keep
aa22     Stage 2    1990    Delete
aa33     Stage 3    1992    Keep

But I want it to look like this: 但我希望它看起来像这样:

ID  Var 1   Date
aa11    Stage 2 1980
aa22    Stage 1 1980
aa33    Stage 3 1992

I want a single row of data per id on these conditions: 1. The entry with the earliest data is taken Else 2. If there are two entries in the same year, take the entry with higher stage (var 1) Else 3. Take the only entry given. 在以下情况下,我希望每个id包含一行数据:1.带有最早数据的条目用于其他2.如果同一年中有两个条目,则采用具有较高阶段(var 1)的条目进行3。给出的唯一条目。 How would you go about writing a piece of SQL code or SAS Data-step for this succinctly? 为此,您将如何编写一段SQL代码或SAS数据步骤?

This is a prioritization query. 这是一个优先级查询。 These are tricky. 这些都是棘手的。 Here is a method using variables to enumerate the rows: 这是一种使用变量枚举行的方法:

select t.*
from (select t.*,
             (@rn := if(@id = id, @rn + 1,
                        if(@id := id, 1, 1)
                       )
             ) as seqnum
      from t cross join
           (select @rn := 0, @id := '') params
      order by id, year asc, var1 asc
     ) t
where seqnum = 1;

The logic for prioritization is being handled by the order by clause. 排序的逻辑由order by子句处理。 The rows are enumerated for each id based on the additional keys. 根据附加键为每个id枚举行。 The outer query then takes the first row encountered. 然后,外部查询采用遇到的第一行。

I believe that aggregation can be used here without using any variables, however I must make one assumption first: - The Var 1 data can be ordered sequentially ie: Stage 1 < Stage 2 < Stage 3 etc. 我相信可以在这里使用聚合而无需使用任何变量,但是我必须先做一个假设:-Var 1数据可以按顺序排序,即:阶段1 <阶段2 <阶段3等。

If so you could write the following to return what you are looking for: 如果是这样,您可以编写以下代码以返回所需内容:

select
    ID
    --Aggregate results by Max Var1 value
    , max(Var1) as Var1
    , [Date]
from
    [YourTable] a
    --Derived Table to return ID and Var1 by lowest Date
    inner join
        (
            select
                ID
                , Var1
                , min([Date]) as [Date]
            from    
                [YourTable]
            group by
                ID
                , Var1
        ) b on a.ID = b.Id
group by
    Id
    , [Date]

In the event that there is only one value that will be returned for that ID as it has both the MIN Date value and the Max Var1 value. 如果该ID仅具有MIN Date值和Max Var1值,则仅返回该ID的一个值。

In SAS this is very simple with a data step. 在SAS中,使用数据步骤非常简单。 Just sort the data in the required order, then use first.id in a data step to extract the first id. 只需按所需顺序对数据进行排序,然后在数据步骤中使用first.id提取第一个ID。 I've assumed that 'Stage I' in your post is a typo and should say 'Stage 1' 我假设您帖子中的“第一阶段”是一个错字,应该说“第一阶段”

/* create original data */
data have;
infile datalines dsd;
input ID $ Var_1 $ Date; 
datalines;
aa11,Stage 1,1980
aa11,Stage 2,1980
aa22,Stage 1,1980
aa22,Stage 2,1990
aa33,Stage 3,1992
;
run;

/* sort dataset */
proc sort data=have;
by id date descending var_1;
run;

/* extract first id only */
data want;
set have;
by id;
if first.id;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM