简体   繁体   English

如何创建具有唯一键的SAS数据集提取观察结果

[英]How to create SAS data set extracting observations with unique keys

I have a sas data set consisting of more than 100 variables. 我有一个由100多个变量组成的sas数据集。 Variables pid - a character variable and year - a numeric variable identify observations in the data set. 变量pid一个字符变量和year -一个数字变量标识数据集中的观测值。

How can I create a new data set consisting of observations that have unique pid and year combination. 如何创建由具有唯一的pidyear组合的观测值组成的新数据集。 That is, if a given pid and year combination occurs more than once, I want to delete all the associated observations, not just the duplicates. 也就是说,如果给定的pidyear组合出现多次,我想删除所有相关的观察值,而不仅仅是重复项。

I don't use much of data step. 我不使用太多数据步骤。 I use proc sql and is easy for me. 我使用proc sql,对我来说很容易。

proc sql;
    create table new_dataset as
    select * from old_dataset as a
      join
    (select pid, year, count(1) from old_dataset group by pid, year having count(1)<2)
    as b on a.pid=b.pid and a.year=b.year;
run;

inner query only gets pid and year which occur once. 内部查询只获取一次的pid和year。 Any multiple occurrence of pid and year are not taken into account because of having count(1)<2 . 由于having count(1)<2因此不考虑pid和year的多次出现。 I get those observations only from original by joining back on pid and year. 我只能通过加入pid和year来获得原始的那些观察结果。 This actually doesn't need sorting. 实际上,这不需要排序。

Let me know in case of any questions. 如有任何问题,请告诉我。

Simple use of first. first.简单使用first. and last. last. in a data step will do this. 在数据步骤中将执行此操作。 Run proc sort if the data is not already sorted by pid and year. 如果数据尚未按pid和year排序,请运行proc sort

proc sort data=have;
by pid year;
run;

data want;
set have;
by pid year;
if first.year and last.year then output; /* only keep unique rows */
run;

Use the UNIQUEOUT and NOUNIQUEKEY option in proc sort for a single step solution. 在proc sort中将UNIQUEOUT和NOUNIQUEKEY选项用于单步解决方案。

data class;
set sashelp.class;
run;

proc sort data=class nouniquekey uniqueout=unique_data;
by sex age;
run;

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p0qh2iuz3fa6rpn1eib1gaxr0sb5.htm http://support.sas.com/documentation/cdl/zh-CN/proc/65145/HTML/default/viewer.htm#p0qh2iuz3fa6rpn1eib1gaxr0sb5.htm

You can generate a dataset containing combinations of pid and year that appear more than once, then merge it with the rest to remove matches: 您可以生成一个包含pid和year的组合的数据集,这些数据组合出现多次,然后将其与其余的合并以删除匹配项:

proc sort data = have nodupkey dupout = duplicates;
    by pid year;
run;

data want;
    merge have 
          duplicates(in = a keep = pid year);
    by pid year;
    if not(a);
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM