简体   繁体   English

如何在WHERE语句中使用宏变量按字符串对数据进行子集化? (SAS 9.3)

[英]How do I use a macro variable in WHERE statement to subset data by a string? (SAS 9.3)

I want to be able to loop PROC SQL over a list of variables in a dataset, and within the SQL code, I want to use the variables in the list in a WHERE statement to subset the observations by a character value. 我希望能够在数据集中的变量列表上循环PROC SQL ,并且在SQL代码中,我想在WHERE语句中使用列表中的变量按字符值对观察值进行子集化。 Specifically, I am looking to count the observations in the dataset where each variable from the list is coded as "Unknown". 具体来说,我希望对数据集中的观察计数,其中列表中的每个变量被编码为“未知”。

I had no problem setting a WHERE MISSING(&VAL)=1 , but I've run into problems when I try to reference a character value. 设置WHERE MISSING(&VAL)=1没有问题,但是当我尝试引用字符值时遇到了问题。

Here's my code. 这是我的代码。 Since I apparently cannot bold the region that is giving me trouble, I've indicated it with <-- PROBLEM AREA (near the bottom). 由于我显然无法加粗给我带来麻烦的区域,因此我在<-问题区域(靠近底部)处进行了表示。 In addition to providing a solution, any other tips to make my code more efficient would be appreciated. 除了提供解决方案之外,任何其他使我的代码更有效的技巧都将受到赞赏。

    %MACRO PERCENTMISSING(LIST);
    PROC SQL NOPRINT;
       %LET N=%SYSFUNC(COUNTW(&LIST));
       %DO I=1 %TO &N;
       %LET VAL = %SCAN(&LIST,&I);
    CREATE TABLE WORK.SALM_&VAL AS
        SELECT DISTINCT "Salmonella" as PATHOGEN,
                            A.YEAR,
                            X.Missing&VAL,
                            Y.Total&VAL,
                            (X.Missing&VAL/Y.Total&VAL) as PropMiss&VAL,
                            C.Unknown&VAL,
                            (C.Unknown&Val/Y.Total&VAL) as PropUnk&VAL
        FROM allsalm as A
        INNER JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Missing&VAL
                    FROM allsalm
                    WHERE MISSING(&VAL)=1
                    GROUP BY Year) X
        ON A.Year=X.Year
        INNER JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Total&VAL
                    FROM allsalm
                    GROUP BY Year) Y
        ON A.Year=Y.Year
        INNER JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Unknown&VAL
                    FROM allsalm
                    WHERE &VAL IN ("Unknown") <-- PROBLEM AREA
                    GROUP BY Year) C
        ON A.Year=C.Year
        ;
    %END;
    QUIT;
    %MEND;

The error message I get is: 我收到的错误消息是:

ERROR: Column UnknownCity could not be found in the table/view identified with the correlation name C.

Figured it out myself, and added another DO loop to execute PROC SQL for a list of variable across a list of datasets. 我自己搞清楚了,并添加了另一个DO循环以对数据集列表中的变量列表执行PROC SQL This might be a nice template for someone attempting to calculate proportions of missing values (and/or "Unknown" values, if your dataset happens to code missing that way as well) for any number of variables across any number of datasets. 对于尝试为任意数量的数据集中的任意数量的变量计算缺失值(和/或“未知”值,如果您的数据集也碰巧以这种方式缺失)的比例而言,这可能是一个不错的模板。

   %MACRO PERCENTMISSING(LIST1,LIST2);
   %LET N1=%SYSFUNC(COUNTW(&LIST1));
   %LET N2=%SYSFUNC(COUNTW(&LIST2));
   %DO I=1 %TO &N1;
      %LET VAL1 = %SCAN(&LIST1,&I);
         %DO J=1 %TO &N2;
            %LET VAL2 = %SCAN(&LIST2,&J);

    PROC SQL NOPRINT;
    CREATE TABLE &VAL1&VAL2 AS
        SELECT DISTINCT "&VAL1" as PATHOGEN,
                            A.YEAR,
                            X.Missing&VAL2,
                            Y.Total&VAL2,
                            (X.Missing&VAL2/Y.Total&VAL2) as PropMiss&VAL2,
                            C.Unknown&VAL2,
                            (C.Unknown&VAL2/Y.Total&VAL2) as PropUnk&VAL2
        FROM &VAL1 as A
        LEFT JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Missing&VAL2
                    FROM &VAL1
                    WHERE (MISSING(&VAL2)=1) OR (&VAL2=" ")
                    GROUP BY Year) X
        ON A.Year=X.Year
        LEFT JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Total&VAL2
                    FROM &VAL1
                    GROUP BY Year) Y
        ON A.Year=Y.Year
        LEFT JOIN (
                    SELECT  YEAR,
                            COUNT(*) AS Unknown&VAL2
                    FROM &VAL1
                    WHERE &VAL2 IN ("U","Unknown")
                    GROUP BY Year) C
        ON A.Year=C.Year;
    QUIT;
  %END;
%END;
%MEND;

Then just invoke the macro, filling in table names for LIST1 and variable names for LIST2. 然后只需调用宏,为LIST1填写表名,为LIST2填写变量名。 For example: 例如:

%PERCENTMISSING(Table1 Table2 Table3 Table4,Var1 Var2 Var3 Var4 Var5);`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM