简体   繁体   English

如何删除SAS数据集中的空白观察

[英]How to delete blank observations in a data set in SAS

I want to delete ALL blank observations from a data set.我想从数据集中删除所有空白观察。 I only know how to get rid of blanks from one variable:我只知道如何从一个变量中去除空白:

data a;
set data(where=(var1 ne .)) ;
run;

Here I set a new data set without the blanks from var1.在这里,我设置了一个没有 var1 空白的新数据集。 But how to do it, when I want to get rid of ALL the blanks in the whole data set?但是,当我想摆脱整个数据集中的所有空白时,该怎么做呢?

Thanks in advance for your answers.预先感谢您的回答。

If you are attempting to get rid of rows where ALL variables are missing, it's quite easy: 如果您试图摆脱缺少所有变量的行,这很容易:

/* Create an example with some or all columns missing */
data have;
set sashelp.class;
if _N_ in (2,5,8,13) then do;
  call missing(of _numeric_);
end;
if _N_ in (5,6,8,12) then do;
  call missing(of _character_);
end;
run;

/* This is the answer */
data want;
set have;
if compress(cats(of _all_),'.')=' ' then delete;
run;

Instead of the compress you could also use OPTIONS MISSING=' '; 而不是压缩你也可以使用OPTIONS MISSING=' '; beforehand. 预先。

If you want to remove ALL Rows with ANY missing values, then you can use NMISS/CMISS functions. 如果要删除具有任何缺失值的所有行,则可以使用NMISS / CMISS函数。

data want;
set have;
if nmiss(of _numeric_) > 0 then delete;
run;

or 要么

data want;
set have;
if nmiss(of _numeric_) + cmiss(of _character_) > 0 then delete;
run;

for all char+numeric variables. 对于所有char +数字变量。

You can do something like this: 你可以这样做:

data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then delete;
end;
drop i;

This will scan trough all the numeric variables and will delete the observation where it finds a missing value 这将扫描所有数字变量,并删除发现缺失值的观察

Here you go. 干得好。 This will work irrespective of the variable being character or numeric. 无论变量是字符还是数字,这都可以。

data withBlanks;
input a$ x y z;
datalines;
a 1 2 3
b 1 . 3
c . . 3
 . . .
d . 2 3
e 1 . 3
f 1 2 3
;
run;

%macro removeRowsWithMissingVals(inDsn, outDsn, Exclusion);
/*Inputs: 
        inDsn: Input dataset with some or all columns missing for some or all rows
        outDsn: Output dataset with some or all columns NOT missing for some or all rows
        Exclusion: Should be one of {AND, OR}. AND will only exclude rows if any columns have missing values, OR will exclude only rows where all columns have  missing values
*/
/*get a list of variables in the input dataset along with their types (i.e., whether they are numericor character type)*/
PROC CONTENTS DATA = &inDsn OUT = CONTENTS(keep = name type varnum);
RUN;
/*put each variable with its own comparison string in a seperate macro variable*/
data _null_;
set CONTENTS nobs = num_of_vars end = lastObs;
/*use NE. for numeric cols (type=1) and NE '' for char types*/
if type = 1 then            call symputx(compress("var"!!varnum), compbl(name!!" NE . "));
else        call symputx(compress("var"!!varnum), compbl(name!!" NE ''  "));
/*make a note of no. of variables to check in the dataset*/
if lastObs then call symputx("no_of_obs", _n_);
run;

DATA &outDsn;
set &inDsn;
where
%do i =1 %to &no_of_obs.;
    &&var&i.
        %if &i < &no_of_obs. %then &Exclusion; 
%end;
;
run;

%mend removeRowsWithMissingVals;

%removeRowsWithMissingVals(withBlanks, withOutBlanksAND, AND);
%removeRowsWithMissingVals(withBlanks, withOutBlanksOR, OR);

Outout of withOutBlanksAND: outOutBlanksAND:

a   x   y   z
a   1   2   3
f   1   2   3

Output of withOutBlanksOR: withOutBlanksOR的输出:

a   x   y   z
a   1   2   3
b   1   .   3
c   .   .   3
e   1   .   3
f   1   2   3

真的很奇怪没有人提供这个优雅的答案:

if missing(cats(of _all_)) then delete;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM