简体   繁体   English

如何在SAS中检测数据集中的观察数量(或者是否为空)?

[英]How to detect how many observations in a dataset (or if it is empty), in SAS?

I wonder if there is a way of detecting whether a data set is empty, ie it has no observations. 我想知道是否有一种检测数据集是否为空的方法,即它没有观察结果。 Or in another saying, how to get the number of observations in a specific data set. 或者在另一种说法中,如何获得特定数据集中的观察数量。

So that I can write an If statement to set some conditions. 这样我就可以编写一个If语句来设置一些条件。

Thanks. 谢谢。

It's easy with PROC SQL. 使用PROC SQL很容易。 Do a count and put the results in a macro variable. 进行计数并将结果放入宏变量中。

proc sql noprint;
 select count(*) into :observations from library.dataset;
quit;

There are lots of different ways, I tend to use a macro function with open() and attrn() . 有很多不同的方法,我倾向于使用open()attrn()的宏函数。 Below is a simple example that works great most of the time. 下面是一个在大多数情况下都很有效的简单示例。 If you are going to be dealing with data views or more complex situations like having a data set with records marked for deletion or active where clauses, then you might need more robust logic. 如果您要处理数据视图或更复杂的情况,例如将数据集标记为删除或活动where子句,那么您可能需要更强大的逻辑。

%macro nobs(ds);
    %let DSID=%sysfunc(OPEN(&ds.,IN));
    %let NOBS=%sysfunc(ATTRN(&DSID,NOBS));
    %let RC=%sysfunc(CLOSE(&DSID));
    &NOBS
%mend;

/* Here is an example */
%put %nobs(sashelp.class);

Here's the more complete example that @cmjohns was talking about. 这是@cmjohns谈论的更完整的例子。 It will return 0 if it is empty, -1 if it is missing, and has options to handle deleted observations and where clauses (note that using a where clause can make the macro take a long time on very large datasets). 如果它是空的,它将返回0,如果它是缺失的则返回-1,并且具有处理已删除的观察和where子句的选项(注意使用where子句可以使宏在非常大的数据集上花费很长时间)。

Usage Notes: 使用说明:

This macro will return the number of observations in a dataset. 此宏将返回数据集中的观察数。 If the dataset does not exist then -1 will be returned. 如果数据集不存在,则返回-1。 I would not recommend this for use with ODBC libnames, use it only against SAS tables. 我不建议将其用于ODBC libnames,仅针对SAS表使用它。

Parameters: 参数:

  • iDs - The libname.dataset that you want to check. iDs - 要检查的libname.dataset
  • iWhereClause ( Optional ) - A where clause to apply iWhereClause( 可选 ) - 要应用的where子句
  • iNobsType ( Optional ) - Either NOBS OR NLOBSF . iNobsType( 可选 ) - NOBSNLOBSF See SASV9 documentation for descriptions. 有关说明 ,请参阅SASV9文档

Macro definition: 宏定义:

%macro nobs(iDs=, iWhereClause=1, iNobsType=nlobsf, iVerbose=1);
  %local dsid nObs rc;

  %if "&iWhereClause" eq "1" %then %do;
    %let dsID = %sysfunc(open(&iDs));
  %end;
  %else %do;
    %let dsID = %sysfunc(open(&iDs(where=(&iWhereClause))));
  %end;

  %if &dsID %then %do;
    %let nObs = %sysfunc(attrn(&dsID,nlobsf));
    %let rc   = %sysfunc(close(&dsID));
  %end;
  %else %do;
    %if &iVerbose %then %do;
      %put WARNING: MACRO.NOBS.SAS: %sysfunc(sysmsg());      
    %end;
    %let nObs  = -1;
  %end;
  &nObs
%mend;

Example Usage: 用法示例:

%put %nobs(iDs=sashelp.class);
%put %nobs(iDs=sashelp.class, iWhereClause=height gt 60);
%put %nobs(iDs=this_dataset_doesnt_exist);

Results 结果

19
12
-1

Installation 安装

I recommend setting up a SAS autocall library and placing this macro in your autocall location. 我建议设置SAS自动调用库并将此宏放在自动调用位置。

Proc sql is not efficient when we have large dataset. 当我们有大数据集时,proc sql效率不高。 Though using ATTRN is good method but this can accomplish within base sas, here is the efficient solution that can give number of obs of even billions of rows just by reading one row: 虽然使用ATTRN是一种很好的方法但是这可以在基本sas内完成,这里是一个有效的解决方案,只需读取一行即可给出数十亿行的数量:

data DS1;
set DS nobs=i;
if _N_ =2 then stop;
No_of_obs=i;
run;

The trick is producing an output even when the dataset is empty. 即使数据集为空,诀窍也是产生输出。

data CountObs;

    i=1;
    set Dataset_to_Evaluate point=i nobs=j; * 'point' avoids review of full dataset*;
    No_of_obs=j;
    output;  * Produces a value before "stop" interrupts processing *;
    stop;   * Needed whenever 'point' is used *;
    keep No_of_obs;
run;

proc print data=CountObs;
run;

The above code is the simplest way I've found to produce the number of observations even when the dataset is empty. 上面的代码是我发现即使数据集为空也能产生观察数量的最简单方法。 I've heard NOBS can be tricky, but the above can work for simple applications. 我听说过NOBS可能很棘手,但上面的内容可以用于简单的应用程序。

A slightly different approach: 略有不同的方法:

proc contents data=library.dataset out=nobs;
run;

proc summary data=nobs nway;
class nobs;
var delobs;
output out=nobs_summ sum=;
run;

This will give you a dataset with one observation; 这将为您提供一个观察数据集; the variable nobs has the value of number of observations in the dataset, even if it is 0. 变量nobs具有数据集中观察数的值,即使它为0。

I guess I am trying to reinvent the wheel here with so many answers already. 我想我已经尝试用这么多的答案重新发明轮子了。 But I do see some other methods trying to count from the actual dataset - this might take a long time for huge datasets. 但我确实看到其他一些方法试图从实际数据集中计算 - 这对于大型数据集可能需要很长时间。 Here is a more efficient method: 这是一种更有效的方法:

proc sql;
select nlobs from sashelp.vtable where libname = "library" and memname="dataset";
quit;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM