[英]How can I select last by group where something in SAS
Say I have a table Tbl
that is sorted by 3 columns {a,b,c}
I also have another 100 columns, one of them is d
. 假设我有一个按3列
{a,b,c}
排序的表Tbl
,我也有另外100列,其中之一是d
。 How can I flag the last row by a group such that d=something
, the flag shall be a new column. 我如何
last row by a group such that d=something
标记last row by a group such that d=something
,标记应为新列。 Hopefully this is doable withOUT re-sorting the whole table 希望这可以在不重新排序整个表格的情况下完成
a b c ...many columns... d IDX
1 5 1
1 3 2
1 3 3
2 3 4
2 3 5
2 2 6
2 2 7
On this table we want to add another column newCol
to flag the last row by group a where d = 3
在此表上,我们要添加另一列
newCol
以last row by group a where d = 3
标记last row by group a where d = 3
a b c ...many columns... d IDX newCol
1 5 1 0
1 3 2 0
1 3 3 1
2 3 4 0
2 3 5 1
2 2 6 0
2 2 7 0
data want;
set have;
by a d notsorted;
if last.d and d=3 then flag=1;
run;
This requires the dataset to be sorted in a useful fashion - it doesn't have to be in order by d, but it does have to have all the d's of one value together (ie, not 3 3 1 3 4 1 2 3
but 3 3 3 3 4 1 1 2
is fine). 这要求数据集以有用的方式进行排序-不必按d进行排序,但必须将一个值的所有d放在一起(即,不是
3 3 1 3 4 1 2 3
3 3 3 3 4 1 1 2
可以)。
If that's not the case, then there isn't a solution that doesn't rely on sorting in some fashion, whether it be SQL (which does sort the data, it just doesn't tell you it's doing it), PROC SORT
, or a hash table (which if you can fit everything into memory might be the fastest sort). 如果不是这种情况,那么就没有一种不依赖某种排序方式的解决方案,无论是SQL(可以对数据进行排序,只是不告诉您它正在这样做),
PROC SORT
,或哈希表(如果您可以将所有内容都放入内存,则可能是最快的排序)。
I'm not sure how this gets implemented, but the following does the work that you want: 我不确定这是如何实现的,但是以下是您想要的工作:
proc sql;
select a, b, c, . . .
from t
group by a, b
having c = max(c);
Note that this syntax is quite specific to SAS proc sql. 请注意,此语法非常特定于SAS proc sql。 It is not ANSI standard and will not work in most other databases.
它不是ANSI标准,因此无法在大多数其他数据库中使用。
This uses a process called "remerging". 这使用称为“重新合并”的过程。 I'm not sure if it resorts the original table.
我不确定是否使用原始表。
EDIT: 编辑:
Flagging the lines is just as easy: 标记线条很容易:
proc sql;
select a, b, c, (case when c = max(c) then 'Y' else 'N' end) as flag, . . .
from t
group by a, b;
However, if the data is already sorted, it is probably more efficient to use a data step for this purpose. 但是,如果已经对数据进行了排序,则为此目的使用数据步骤可能会更有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.