[英]PROC SQL - Transposing Data into Columns when Condition Met
My data is structured as below, where each unique ID will have a row displaying the balance on the last day of the month : 我的数据结构如下,其中每个唯一ID都有一行显示该月最后一天的余额:
ID Day_Key Balance
23412 20171229 50000
23412 20180131 45000
23412 20180228 40000
27435 20171229 100000
27435 20180131 80000
27435 20180228 60000
I want to create a table where each unique ID is displayed on one row, with columns indicating the balance at each month, like so : 我想创建一个表,其中每个唯一ID显示在一行上,列指示每个月的余额,如下所示:
ID DEC17 JAN 18 FEB18
23412 50000 45000 40000
27435 100000 80000 60000
**UPDATE* **更新*
My current code is shown below 我目前的代码如下所示
PROC SQL;
CREATE TABLE BAL_TRANSPOSE AS
SELECT DISTINCT ID,
MAX(SUB_EY17) AS EY17,
MAX(SUB_JAN18) AS JAN18,
MAX(SUB_FEB18) AS FEB18,
MAX(SUB_MAR18) AS MAR18,
MAX(SUB_APR18) AS APR18,
MAX(SUB_MAY18) AS MAY18,
MAX(SUB_JUN18) AS JUN18,
MAX(SUB_JUL18) AS JUL18,
MAX(SUB_AUG18) AS AUG18,
MAX(SUB_SEP18) AS SEP18,
MAX(SUB_OCT18) AS OCT18,
MAX(SUB_NOV18) AS NOV18,
MAX(SUB_EY18) AS EY18
FROM (SELECT DISTINCT ID,
CASE WHEN DAY_KEY = 20171229 THEN OUTSTANDING_BALANCE END AS SUB_EY17,
CASE WHEN DAY_KEY = 20180131 THEN OUTSTANDING_BALANCE END AS SUB_JAN18,
CASE WHEN DAY_KEY = 20180228 THEN OUTSTANDING_BALANCE END AS SUB_FEB18,
CASE WHEN DAY_KEY = 20180330 THEN OUTSTANDING_BALANCE END AS SUB_MAR18,
CASE WHEN DAY_KEY = 20180430 THEN OUTSTANDING_BALANCE END AS SUB_APR18,
CASE WHEN DAY_KEY = 20180531 THEN OUTSTANDING_BALANCE END AS SUB_MAY18,
CASE WHEN DAY_KEY = 20180629 THEN OUTSTANDING_BALANCE END AS SUB_JUN18,
CASE WHEN DAY_KEY = 20180731 THEN OUTSTANDING_BALANCE END AS SUB_JUL18,
CASE WHEN DAY_KEY = 20180831 THEN OUTSTANDING_BALANCE END AS SUB_AUG18,
CASE WHEN DAY_KEY = 20180928 THEN OUTSTANDING_BALANCE END AS SUB_SEP18,
CASE WHEN DAY_KEY = 20181031 THEN OUTSTANDING_BALANCE END AS SUB_OCT18,
CASE WHEN DAY_KEY = 20181130 THEN OUTSTANDING_BALANCE END AS SUB_NOV18,
CASE WHEN DAY_KEY = 20181231 THEN OUTSTANDING_BALANCE END AS SUB_EY18
FROM TABLE1) AS SUB
GROUP BY ID;
QUIT;
The new columns are created, however only null values appear. 将创建新列,但仅显示空值。 Below is the results I am seeing (trimmed for readability). 以下是我看到的结果(为了便于阅读而修剪)。 The query returns over 1m records but from what I can see, all have 0 values. 查询返回超过1m的记录,但从我所看到的,所有记录都有0个值。 I have tested the data and know that every ID should have a value for each day_key. 我已经测试了数据并且知道每个ID应该具有每天day_key的值。
ID EY17 JAN18 FEB18 MAR18 APR18
1111 - - - - -
2222 - - - - -
3333 - - - - -
4444 - - - - -
5555 - - - - -
you can use proc transpose: 你可以使用proc transpose:
/*prepare*/
data g;
input ID Day_Key Balance;
datalines4;
23412 20171229 50000
23412 20180131 45000
23412 20180228 40000
27435 20171229 100000
27435 20180131 80000
27435 20180228 60000
;;;;
run;
proc sort ;
by id;
run;
/*you need*/
proc transpose data=g out=g2;
id Day_Key;
by id;
run;
You will get: 你会得到:
+-------+----------+----------+----------+
| ID | 20171229 | 20180131 | 20180228 |
+-------+----------+----------+----------+
| 23412 | 50000 | 45000 | 40000 |
| 27435 | 100000 | 80000 | 60000 |
+-------+----------+----------+----------+
So, you can format your dates , that give you names "JAN18" and eg 因此,您可以设置日期格式,为您命名为“JAN18”,例如
In addition, you could use IDLABEL
. 此外,您可以使用IDLABEL
。
Proc tranpose is best for this scenario. Proc转换最适合这种情况。 You were also close with SQL. 你也接近SQL。 All you need was small change by adding a aggregate function. 通过添加聚合函数,您只需要进行小的更改。
PROC SQL;
CREATE TABLE BAL_TRANSPOSE AS
SELECT ID,
max(CASE WHEN DAY_KEY = 20171229 THEN BALANCE END) AS DEC17,
max(CASE WHEN DAY_KEY = 20180131 THEN BALANCE END) AS JAN18,
max(CASE WHEN DAY_KEY = 20180228 THEN BALANCE END) AS FEB18,
FROM TABLE1
GROUP BY ID
QUIT;
Original SQL would work with aggregated function as the process is known as conditional aggregation, a common form of pivoting data from long to wide when columns are known and a handful in number. 原始SQL将与聚合函数一起使用,因为该过程称为条件聚合,这是一种在列已知且数量少的情况下将数据从长到大旋转的常见形式。
PROC SQL;
CREATE TABLE BAL_TRANSPOSE AS
SELECT ID,
MAX(CASE WHEN DAY_KEY = 20171229 THEN BALANCE END) AS DEC17,
MAX(CASE WHEN DAY_KEY = 20180131 THEN BALANCE END) AS JAN18,
MAX(CASE WHEN DAY_KEY = 20180228 THEN BALANCE END) AS FEB18
FROM TABLE1
GROUP BY ID
QUIT;
However, with SAS proc sql
you may need to use a subquery: 但是,使用SAS proc sql
您可能需要使用子查询:
PROC SQL;
CREATE TABLE BAL_TRANSPOSE AS
SELECT ID,
MAX(SUB_DEC17) AS DEC17,
MAX(SUB_JAN18) AS JAN18,
MAX(SUB_FEB18) AS FEB18
FROM (SELECT ID,
CASE WHEN DAY_KEY = 20171229 THEN BALANCE END AS SUB_DEC17,
CASE WHEN DAY_KEY = 20180131 THEN BALANCE END AS SUB_JAN18,
CASE WHEN DAY_KEY = 20180228 THEN BALANCE END AS SUB_FEB18
FROM TABLE1) AS sub
GROUP BY ID
QUIT;
Actually your original query should have erred out since you included non-aggregated columns in SELECT
that did not appear in GROUP BY
-a violation in ANSI-SQL standards. 实际上,您的原始查询应该已经错误,因为您在SELECT
包含了未出现在GROUP BY
非聚合列 - 在ANSI-SQL标准中违规。 SAS likely converted your attempted aggregate query to unit level (ie, ignored GROUP BY
) as possibly shown with log notes or warnings. SAS可能会将您尝试的聚合查询转换为单位级别(即忽略GROUP BY
),可能会显示日志备注或警告。
Transposing a time dimension into an column identifier can often mean a report is desired instead of a data transformation. 将时间维度转换为列标识符通常意味着需要报告而不是数据转换。
Consider using TABULATE
or REPORT
考虑使用TABULATE
或REPORT
data have;
attrib
id length=8
day_key length=4 informat=yymmdd8. format=yymmdd10.
balance length=8 format=comma12.
;
input
ID Day_Key Balance; datalines;
23412 20171229 50000
23412 20180131 45000
23412 20180228 40000
27435 20171229 100000
27435 20180131 80000
27435 20180228 60000
run;
ods html;
proc tabulate data=have;
class id day_key;
var balance;
format day_key monyy7.;
table
id = ''
,
day_key='' * balance='' * max='' * f=comma12.
/
box = 'id'
;
run;
** -- OR --;
proc report data=have;
columns id (balance, day_key);
define id / group;
define day_key / ' ' across format=monyy7.;
define balance / ' ' analysis max;
run;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.