简体   繁体   English

SAS-使用proc SQL的平衡面板数据集

[英]SAS - balanced panel data set using proc sql

I'm using the following PROC SQL step to pull data: 我正在使用以下PROC SQL步骤来提取数据:

PROC SQL;  

create table panel as  
select ID, Month, Var1, Var2, Var3  
from data  
order by ID, Month;  
quit;  

I want to use the data to build a balanced panel data set, but there will be IDs missing, which means the value for each variable for each month should equal zero. 我想使用该数据来构建平衡的面板数据集,但是会缺少ID,这意味着每个月每个变量的值应等于零。

I cannot figure out how I can write a query or any data steps that will insert the missing IDs into the data set for each month and then give zeros as values. 我无法弄清楚如何编写查询或任何数据步骤,这些步骤会将丢失的ID插入每个月的数据集中,然后将零作为值。

For example, my query will make the following table: 例如,我的查询将生成下表:
UNBALANCED PANEL 平衡面板

My problem is that there is an ID "A" that is not represented in the data that I'm pulling, but ID "A" does exist. 我的问题是我要提取的数据中没有表示一个ID“ A”,但是ID“ A”确实存在。 Also, to add complexity, ID "C" appears in the PROC SQL intermittently rather than on a monthly basis, but I would like to show it as zeros for each month it does not appear in the database. 另外,为了增加复杂性,ID“ C”会间歇性地出现在PROC SQL中,而不是按月出现,但是我想将它显示为每月零而不显示在数据库中。 Therefore, I'm trying to have any missing data for known IDs appear for each month and with zeroes for each Var. 因此,我试图让每个月都出现已知ID的任何缺失数据,并且每个Var都为零。

For example: 例如:
BALANCED PANEL 平衡面板

This has been stumping me for a few weeks and if anyone has any insights then it would be greatly appreciated! 这让我难受了几个星期,如果有人有任何见解,将不胜感激!

This will not be the most elegant solution but it uses basic code that's easy to understand: 这不是最优雅的解决方案,但它使用了易于理解的基本代码:

1) Have a dataset will all known IDs and Months 1)有一个数据集将所有已知的ID和月份

data ids;
infile datalines;
input ID $;
month='Jan'; output;
month='Feb'; output;
month='Mar'; output;
month='Apr'; output;
month='May'; output;
month='Jun'; output;
month='Jul'; output;
month='Aug'; output;
month='Sep'; output;
month='Oct'; output;
month='Nov'; output;
month='Dec'; output;
datalines;
A
B
C
D
;
run;

(this example is static as I don't know your data but if you can pull them from somewhere, eg select distinct ID, Month from table , it's of course much better) (此示例是静态的,因为我不知道您的数据,但是如果您可以从某个地方提取它们,例如select distinct ID, Month from table ,那当然会更好)

2) Do your proc sql as you did: 2)像以前一样执行proc sql:

proc sql;
create table panel as  
select ID, Month, Var1, Var2, Var3  
from data  
order by ID, Month;  
quit;

3) Then right join your result with the "table of zeroes to get those records for the 'missing IDs' 3)然后将您的结果与“零表”正确连接,以获取“缺失ID”的记录

proc sql;
create table panel_balanced as
select coalesce(t1.ID,t2.ID) as ID
      ,coalesce(t1.Month,t2.Month) as Month
      ,coalesce(t1.var1,0) as var1
      ,coalesce(t1.var2,0) as var2
      ,coalesce(t1.var3,0) as var3
from panel t1
right join ids t2
  on t1.ID=t2.ID
  and t1.Month=t2.Month
;
quit;

You can of course combine step 2 and 3 into one query or even do the whole thing in one SQL query if the table from step 1 can also be created with SQL. 如果步骤1中的表也可以用SQL创建,那么您当然可以将步骤2和3组合为一个查询,甚至可以在一个SQL查询中完成整个操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM