简体   繁体   English

PROC SQL - 在条件满足时将数据转换为列

[英]PROC SQL - Transposing Data into Columns when Condition Met

My data is structured as below, where each unique ID will have a row displaying the balance on the last day of the month : 我的数据结构如下,其中每个唯一ID都有一行显示该月最后一天的余额:

ID      Day_Key    Balance
23412   20171229   50000
23412   20180131   45000
23412   20180228   40000   
27435   20171229   100000    
27435   20180131   80000
27435   20180228   60000

I want to create a table where each unique ID is displayed on one row, with columns indicating the balance at each month, like so : 我想创建一个表,其中每个唯一ID显示在一行上,列指示每个月的余额,如下所示:

ID     DEC17    JAN 18    FEB18
23412  50000    45000     40000
27435  100000   80000     60000

**UPDATE* **更新*

My current code is shown below 我目前的代码如下所示

PROC SQL;
CREATE TABLE BAL_TRANSPOSE AS 
SELECT DISTINCT ID,
        MAX(SUB_EY17) AS EY17,
      MAX(SUB_JAN18) AS JAN18,
      MAX(SUB_FEB18) AS FEB18,
      MAX(SUB_MAR18) AS MAR18,
      MAX(SUB_APR18) AS APR18,
      MAX(SUB_MAY18) AS MAY18,
      MAX(SUB_JUN18) AS JUN18,
      MAX(SUB_JUL18) AS JUL18,
      MAX(SUB_AUG18) AS AUG18,
      MAX(SUB_SEP18) AS SEP18,
      MAX(SUB_OCT18) AS OCT18,
      MAX(SUB_NOV18) AS NOV18,
      MAX(SUB_EY18) AS EY18
FROM (SELECT DISTINCT ID,
    CASE WHEN DAY_KEY = 20171229 THEN OUTSTANDING_BALANCE END AS SUB_EY17,
    CASE WHEN DAY_KEY = 20180131 THEN OUTSTANDING_BALANCE END AS SUB_JAN18,
    CASE WHEN DAY_KEY = 20180228 THEN OUTSTANDING_BALANCE END AS SUB_FEB18,
    CASE WHEN DAY_KEY = 20180330 THEN OUTSTANDING_BALANCE END AS SUB_MAR18,
    CASE WHEN DAY_KEY = 20180430 THEN OUTSTANDING_BALANCE END AS SUB_APR18,
    CASE WHEN DAY_KEY = 20180531 THEN OUTSTANDING_BALANCE END AS SUB_MAY18,
    CASE WHEN DAY_KEY = 20180629 THEN OUTSTANDING_BALANCE END AS SUB_JUN18,
    CASE WHEN DAY_KEY = 20180731 THEN OUTSTANDING_BALANCE END AS SUB_JUL18,
    CASE WHEN DAY_KEY = 20180831 THEN OUTSTANDING_BALANCE END AS SUB_AUG18,
    CASE WHEN DAY_KEY = 20180928 THEN OUTSTANDING_BALANCE END AS SUB_SEP18,
    CASE WHEN DAY_KEY = 20181031 THEN OUTSTANDING_BALANCE END AS SUB_OCT18,
    CASE WHEN DAY_KEY = 20181130 THEN OUTSTANDING_BALANCE END AS SUB_NOV18,
    CASE WHEN DAY_KEY = 20181231 THEN OUTSTANDING_BALANCE END AS SUB_EY18
FROM TABLE1) AS SUB
GROUP BY ID;   
QUIT;

The new columns are created, however only null values appear. 将创建新列,但仅显示空值。 Below is the results I am seeing (trimmed for readability). 以下是我看到的结果(为了便于阅读而修剪)。 The query returns over 1m records but from what I can see, all have 0 values. 查询返回超过1m的记录,但从我所看到的,所有记录都有0个值。 I have tested the data and know that every ID should have a value for each day_key. 我已经测试了数据并且知道每个ID应该具有每天day_key的值。

ID      EY17    JAN18        FEB18       MAR18         APR18   
1111    -       -            -            -            -
2222    -       -            -            -            -
3333    -       -            -            -            -
4444    -       -            -            -            -
5555    -       -            -            -            -

you can use proc transpose: 你可以使用proc transpose:

/*prepare*/
data g;
input ID  Day_Key   Balance;
datalines4;
23412   20171229   50000
23412   20180131   45000
23412   20180228   40000   
27435   20171229   100000    
27435   20180131   80000
27435   20180228   60000
;;;;
run;

proc sort ;
by id;
run;

/*you need*/
proc transpose data=g out=g2;
id Day_Key;
by id;
run;

You will get: 你会得到:

+-------+----------+----------+----------+
|  ID   | 20171229 | 20180131 | 20180228 |
+-------+----------+----------+----------+
| 23412 |    50000 |    45000 |    40000 |
| 27435 |   100000 |    80000 |    60000 |
+-------+----------+----------+----------+

So, you can format your dates , that give you names "JAN18" and eg 因此,您可以设置日期格式,为您命名为“JAN18”,例如

In addition, you could use IDLABEL . 此外,您可以使用IDLABEL

Proc tranpose is best for this scenario. Proc转换最适合这种情况。 You were also close with SQL. 你也接近SQL。 All you need was small change by adding a aggregate function. 通过添加聚合函数,您只需要进行小的更改。

 PROC SQL;
  CREATE TABLE BAL_TRANSPOSE AS 
  SELECT ID,
       max(CASE WHEN DAY_KEY = 20171229 THEN BALANCE END) AS DEC17,
       max(CASE WHEN DAY_KEY = 20180131 THEN BALANCE END) AS JAN18,
        max(CASE WHEN DAY_KEY = 20180228 THEN BALANCE END) AS FEB18,
  FROM TABLE1
  GROUP BY ID    
 QUIT;

Original SQL would work with aggregated function as the process is known as conditional aggregation, a common form of pivoting data from long to wide when columns are known and a handful in number. 原始SQL将与聚合函数一起使用,因为该过程称为条件聚合,这是一种在列已知且数量少的情况下将数据从长到大旋转的常见形式。

PROC SQL;
   CREATE TABLE BAL_TRANSPOSE AS 
   SELECT ID,
          MAX(CASE WHEN DAY_KEY = 20171229 THEN BALANCE END) AS DEC17,
          MAX(CASE WHEN DAY_KEY = 20180131 THEN BALANCE END) AS JAN18,
          MAX(CASE WHEN DAY_KEY = 20180228 THEN BALANCE END) AS FEB18
   FROM TABLE1
   GROUP BY ID    
QUIT;

However, with SAS proc sql you may need to use a subquery: 但是,使用SAS proc sql您可能需要使用子查询:

PROC SQL;
   CREATE TABLE BAL_TRANSPOSE AS 
   SELECT ID, 
          MAX(SUB_DEC17) AS DEC17,
          MAX(SUB_JAN18) AS JAN18,
          MAX(SUB_FEB18) AS FEB18
   FROM (SELECT ID,
                CASE WHEN DAY_KEY = 20171229 THEN BALANCE END AS SUB_DEC17,
                CASE WHEN DAY_KEY = 20180131 THEN BALANCE END AS SUB_JAN18,
                CASE WHEN DAY_KEY = 20180228 THEN BALANCE END AS SUB_FEB18
         FROM TABLE1) AS sub
   GROUP BY ID    
QUIT;

Actually your original query should have erred out since you included non-aggregated columns in SELECT that did not appear in GROUP BY -a violation in ANSI-SQL standards. 实际上,您的原始查询应该已经错误,因为您在SELECT包含了未出现在GROUP BY非聚合列 - 在ANSI-SQL标准中违规。 SAS likely converted your attempted aggregate query to unit level (ie, ignored GROUP BY ) as possibly shown with log notes or warnings. SAS可能会将您尝试的聚合查询转换为单位级别(即忽略GROUP BY ),可能会显示日志备注或警告。

Transposing a time dimension into an column identifier can often mean a report is desired instead of a data transformation. 将时间维度转换为列标识符通常意味着需要报告而不是数据转换。

Consider using TABULATE or REPORT 考虑使用TABULATEREPORT

data have;
attrib 
  id length=8
  day_key length=4 informat=yymmdd8. format=yymmdd10.
  balance length=8 format=comma12.
;
input
ID      Day_Key    Balance; datalines;
23412   20171229   50000
23412   20180131   45000
23412   20180228   40000   
27435   20171229   100000    
27435   20180131   80000
27435   20180228   60000
run;

ods html;

proc tabulate data=have;
  class id day_key;
  var balance;
  format day_key monyy7.;
  table 
    id = ''
    ,
    day_key='' * balance='' * max='' * f=comma12.
    /
    box = 'id'
  ;
run;

** -- OR --;

proc report data=have;
  columns id (balance, day_key);
  define id / group;
  define day_key / ' ' across format=monyy7.;
  define balance / ' ' analysis max;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM