[英]Aliases and Group By Statements in SAS Proc SQL
我在SAS中使用proc SQL,我的proc sql查询之一表现得很奇怪:
我有一个大数据集(大约一百万行),看起来像这样:
apple_key profit price cost months date
golden_d 0.03 12 4 3 01/12
golden_d 0.03 8 0 2 01/12
granny_s 0.05 15 5 5 02/12
red_d 0.04 13 0 1 01/12
golden_d 0.02 1 2 12 03/14
在此数据集上,我正在运行以下查询:
%let picking_date = 01/12; /* I simplify here - this part of my code definitely works */
proc sql;
CREATE TABLE output AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
run;
运行此命令时, GROUP BY
语句不起作用。 我为单个组获得了很多条目(apple_id,apple_name,cost_flag,age和farm都相同,但是我的聚合无法正常工作)。
但是,当我分别运行GROUP BY时(如下所示),一切正常。 对于每个组,我都有一个“价格加权利润”条目:
proc sql;
CREATE TABLE output_tmp AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM output_tmp
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
quit;
为什么会这样呢? 我该如何解决? 这让我有点疯狂...谢谢您的帮助
这是行不通的,因为group by没有将sum(profit * price)/ sum(price)语句作为聚合函数。 由于使用了诸如age,cost_flag等别名,因此无法执行此操作
无论如何,以下是正确的查询:
Proc sql;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM
(
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
) a
GROUP BY apple_id, apple_name, cost_flag, age, farm;
quit;
如果您有任何疑问,请告诉我
经验法则:-每当在select子句中使用任何聚合函数时,其余列均应属于group by的一部分。 在您发布的问题中,您应用总和(利润*价格)/总和(价格),但是没有哪个组引起问题。
Proc sql;
CREATE TABLE output AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
GROUP BY apple_id, apple_name, cost_flag, age, farm;
quit;
我怀疑正在发生的事情正在重演 。 SAS proc sql接受如下代码:
proc sql;
select a.*, count(*)
from a;
这不会汇总数据。 而是将总计数放在每一行上。 换句话说,如果select
的键与group by
不完全匹配,则将基于group by
键计算聚合函数,但是结果将放到各个行中。 其他数据库使用窗口功能的子集来执行此操作。
就您而言,重新合并并不明显。 我认为这里存在关键的困惑,因为您在select
中使用的名称与原始数据中的名称相同。 我的建议是更改别名,以使它们无歧义,并确保group by
中的键无歧义。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.