如何按组最后选择SAS中的某些内容

Question

Say I have a table Tbl that is sorted by 3 columns {a,b,c} I also have another 100 columns, one of them is d . 假设我有一个按3列{a,b,c}排序的表Tbl ，我也有另外100列，其中之一是d 。 How can I flag the last row by a group such that d=something , the flag shall be a new column. 我如何last row by a group such that d=something标记last row by a group such that d=something ，标记应为新列。 Hopefully this is doable withOUT re-sorting the whole table 希望这可以在不重新排序整个表格的情况下完成

a b c ...many columns... d IDX
1                        5 1                        
1                        3 2
1                        3 3
2                        3 4 
2                        3 5
2                        2 6
2                        2 7

On this table we want to add another column newCol to flag the last row by group a where d = 3 在此表上，我们要添加另一列newCol以last row by group a where d = 3标记last row by group a where d = 3

a b c ...many columns... d IDX newCol
1                        5 1   0                     
1                        3 2   0
1                        3 3   1
2                        3 4   0
2                        3 5   1
2                        2 6   0
2                        2 7   0

Answer 1

data want;
set have;
by a d notsorted;
if last.d and d=3 then flag=1;
run;

This requires the dataset to be sorted in a useful fashion - it doesn't have to be in order by d, but it does have to have all the d's of one value together (ie, not 3 3 1 3 4 1 2 3 but 3 3 3 3 4 1 1 2 is fine). 这要求数据集以有用的方式进行排序-不必按d进行排序，但必须将一个值的所有d放在一起（即，不是3 3 1 3 4 1 2 3 3 3 3 3 4 1 1 2可以）。

If that's not the case, then there isn't a solution that doesn't rely on sorting in some fashion, whether it be SQL (which does sort the data, it just doesn't tell you it's doing it), PROC SORT , or a hash table (which if you can fit everything into memory might be the fastest sort). 如果不是这种情况，那么就没有一种不依赖某种排序方式的解决方案，无论是SQL（可以对数据进行排序，只是不告诉您它正在这样做）， PROC SORT ，或哈希表（如果您可以将所有内容都放入内存，则可能是最快的排序）。

Answer 2

I'm not sure how this gets implemented, but the following does the work that you want: 我不确定这是如何实现的，但是以下是您想要的工作：

proc sql;
    select a, b, c, . . .
    from t
    group by a, b
    having c = max(c);

Note that this syntax is quite specific to SAS proc sql. 请注意，此语法非常特定于SAS proc sql。 It is not ANSI standard and will not work in most other databases. 它不是ANSI标准，因此无法在大多数其他数据库中使用。

This uses a process called "remerging". 这使用称为“重新合并”的过程。 I'm not sure if it resorts the original table. 我不确定是否使用原始表。

EDIT: 编辑：

Flagging the lines is just as easy: 标记线条很容易：

proc sql;
    select a, b, c, (case when c = max(c) then 'Y' else 'N' end) as flag, . . .
    from t
    group by a, b;

However, if the data is already sorted, it is probably more efficient to use a data step for this purpose. 但是，如果已经对数据进行了排序，则为此目的使用数据步骤可能会更有效。

如何按组最后选择SAS中的某些内容

问题描述

2 个解决方案

解决方案1
1 已采纳 2013-08-30 14:07:09

解决方案2
0 2013-08-30 11:43:41

如何按组最后选择SAS中的某些内容

问题描述

2 个解决方案

解决方案1 1 已采纳 2013-08-30 14:07:09

解决方案2 0 2013-08-30 11:43:41

解决方案1
1 已采纳 2013-08-30 14:07:09

解决方案2
0 2013-08-30 11:43:41