如何创建具有唯一键的SAS数据集提取观察结果

Question

I have a sas data set consisting of more than 100 variables. 我有一个由100多个变量组成的sas数据集。 Variables pid - a character variable and year - a numeric variable identify observations in the data set. 变量pid一个字符变量和year -一个数字变量标识数据集中的观测值。

How can I create a new data set consisting of observations that have unique pid and year combination. 如何创建由具有唯一的pid和year组合的观测值组成的新数据集。 That is, if a given pid and year combination occurs more than once, I want to delete all the associated observations, not just the duplicates. 也就是说，如果给定的pid和year组合出现多次，我想删除所有相关的观察值，而不仅仅是重复项。

Answer 1

I don't use much of data step. 我不使用太多数据步骤。 I use proc sql and is easy for me. 我使用proc sql，对我来说很容易。

proc sql;
    create table new_dataset as
    select * from old_dataset as a
      join
    (select pid, year, count(1) from old_dataset group by pid, year having count(1)<2)
    as b on a.pid=b.pid and a.year=b.year;
run;

inner query only gets pid and year which occur once. 内部查询只获取一次的pid和year。 Any multiple occurrence of pid and year are not taken into account because of having count(1)<2 . 由于having count(1)<2因此不考虑pid和year的多次出现。 I get those observations only from original by joining back on pid and year. 我只能通过加入pid和year来获得原始的那些观察结果。 This actually doesn't need sorting. 实际上，这不需要排序。

Let me know in case of any questions. 如有任何问题，请告诉我。

Answer 2

Simple use of first. first.简单使用first. and last. last. in a data step will do this. 在数据步骤中将执行此操作。 Run proc sort if the data is not already sorted by pid and year. 如果数据尚未按pid和year排序，请运行proc sort 。

proc sort data=have;
by pid year;
run;

data want;
set have;
by pid year;
if first.year and last.year then output; /* only keep unique rows */
run;

Answer 3

Use the UNIQUEOUT and NOUNIQUEKEY option in proc sort for a single step solution. 在proc sort中将UNIQUEOUT和NOUNIQUEKEY选项用于单步解决方案。

data class;
set sashelp.class;
run;

proc sort data=class nouniquekey uniqueout=unique_data;
by sex age;
run;

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p0qh2iuz3fa6rpn1eib1gaxr0sb5.htm http://support.sas.com/documentation/cdl/zh-CN/proc/65145/HTML/default/viewer.htm#p0qh2iuz3fa6rpn1eib1gaxr0sb5.htm

Answer 4

You can generate a dataset containing combinations of pid and year that appear more than once, then merge it with the rest to remove matches: 您可以生成一个包含pid和year的组合的数据集，这些数据组合出现多次，然后将其与其余的合并以删除匹配项：

proc sort data = have nodupkey dupout = duplicates;
    by pid year;
run;

data want;
    merge have 
          duplicates(in = a keep = pid year);
    by pid year;
    if not(a);
run;

如何创建具有唯一键的SAS数据集提取观察结果

问题描述

4 个解决方案

解决方案1
3 已采纳 2015-03-19 19:01:05

解决方案2
2 2015-03-19 18:06:56

解决方案3
2 2015-03-19 23:01:45

解决方案4
1 2015-03-19 18:05:02

如何创建具有唯一键的SAS数据集提取观察结果

问题描述

4 个解决方案

解决方案1 3 已采纳 2015-03-19 19:01:05

解决方案2 2 2015-03-19 18:06:56

解决方案3 2 2015-03-19 23:01:45

解决方案4 1 2015-03-19 18:05:02

解决方案1
3 已采纳 2015-03-19 19:01:05

解决方案2
2 2015-03-19 18:06:56

解决方案3
2 2015-03-19 23:01:45

解决方案4
1 2015-03-19 18:05:02