[英]How do I use a macro variable in WHERE statement to subset data by a string? (SAS 9.3)
I want to be able to loop PROC SQL
over a list of variables in a dataset, and within the SQL code, I want to use the variables in the list in a WHERE statement to subset the observations by a character value. 我希望能够在数据集中的变量列表上循环
PROC SQL
,并且在SQL代码中,我想在WHERE语句中使用列表中的变量按字符值对观察值进行子集化。 Specifically, I am looking to count the observations in the dataset where each variable from the list is coded as "Unknown". 具体来说,我希望对数据集中的观察计数,其中列表中的每个变量被编码为“未知”。
I had no problem setting a WHERE MISSING(&VAL)=1
, but I've run into problems when I try to reference a character value. 设置
WHERE MISSING(&VAL)=1
没有问题,但是当我尝试引用字符值时遇到了问题。
Here's my code. 这是我的代码。 Since I apparently cannot bold the region that is giving me trouble, I've indicated it with <-- PROBLEM AREA (near the bottom).
由于我显然无法加粗给我带来麻烦的区域,因此我在<-问题区域(靠近底部)处进行了表示。 In addition to providing a solution, any other tips to make my code more efficient would be appreciated.
除了提供解决方案之外,任何其他使我的代码更有效的技巧都将受到赞赏。
%MACRO PERCENTMISSING(LIST);
PROC SQL NOPRINT;
%LET N=%SYSFUNC(COUNTW(&LIST));
%DO I=1 %TO &N;
%LET VAL = %SCAN(&LIST,&I);
CREATE TABLE WORK.SALM_&VAL AS
SELECT DISTINCT "Salmonella" as PATHOGEN,
A.YEAR,
X.Missing&VAL,
Y.Total&VAL,
(X.Missing&VAL/Y.Total&VAL) as PropMiss&VAL,
C.Unknown&VAL,
(C.Unknown&Val/Y.Total&VAL) as PropUnk&VAL
FROM allsalm as A
INNER JOIN (
SELECT YEAR,
COUNT(*) AS Missing&VAL
FROM allsalm
WHERE MISSING(&VAL)=1
GROUP BY Year) X
ON A.Year=X.Year
INNER JOIN (
SELECT YEAR,
COUNT(*) AS Total&VAL
FROM allsalm
GROUP BY Year) Y
ON A.Year=Y.Year
INNER JOIN (
SELECT YEAR,
COUNT(*) AS Unknown&VAL
FROM allsalm
WHERE &VAL IN ("Unknown") <-- PROBLEM AREA
GROUP BY Year) C
ON A.Year=C.Year
;
%END;
QUIT;
%MEND;
The error message I get is: 我收到的错误消息是:
ERROR: Column UnknownCity could not be found in the table/view identified with the correlation name C.
Figured it out myself, and added another DO
loop to execute PROC SQL
for a list of variable across a list of datasets. 我自己搞清楚了,并添加了另一个
DO
循环以对数据集列表中的变量列表执行PROC SQL
。 This might be a nice template for someone attempting to calculate proportions of missing values (and/or "Unknown" values, if your dataset happens to code missing that way as well) for any number of variables across any number of datasets. 对于尝试为任意数量的数据集中的任意数量的变量计算缺失值(和/或“未知”值,如果您的数据集也碰巧以这种方式缺失)的比例而言,这可能是一个不错的模板。
%MACRO PERCENTMISSING(LIST1,LIST2);
%LET N1=%SYSFUNC(COUNTW(&LIST1));
%LET N2=%SYSFUNC(COUNTW(&LIST2));
%DO I=1 %TO &N1;
%LET VAL1 = %SCAN(&LIST1,&I);
%DO J=1 %TO &N2;
%LET VAL2 = %SCAN(&LIST2,&J);
PROC SQL NOPRINT;
CREATE TABLE &VAL1&VAL2 AS
SELECT DISTINCT "&VAL1" as PATHOGEN,
A.YEAR,
X.Missing&VAL2,
Y.Total&VAL2,
(X.Missing&VAL2/Y.Total&VAL2) as PropMiss&VAL2,
C.Unknown&VAL2,
(C.Unknown&VAL2/Y.Total&VAL2) as PropUnk&VAL2
FROM &VAL1 as A
LEFT JOIN (
SELECT YEAR,
COUNT(*) AS Missing&VAL2
FROM &VAL1
WHERE (MISSING(&VAL2)=1) OR (&VAL2=" ")
GROUP BY Year) X
ON A.Year=X.Year
LEFT JOIN (
SELECT YEAR,
COUNT(*) AS Total&VAL2
FROM &VAL1
GROUP BY Year) Y
ON A.Year=Y.Year
LEFT JOIN (
SELECT YEAR,
COUNT(*) AS Unknown&VAL2
FROM &VAL1
WHERE &VAL2 IN ("U","Unknown")
GROUP BY Year) C
ON A.Year=C.Year;
QUIT;
%END;
%END;
%MEND;
Then just invoke the macro, filling in table names for LIST1 and variable names for LIST2. 然后只需调用宏,为LIST1填写表名,为LIST2填写变量名。 For example:
例如:
%PERCENTMISSING(Table1 Table2 Table3 Table4,Var1 Var2 Var3 Var4 Var5);`
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.