繁体   English   中英

SAS Proc SQL中的别名和分组依据语句

[英]Aliases and Group By Statements in SAS Proc SQL

我在SAS中使用proc SQL,我的proc sql查询之一表现得很奇怪:

我有一个大数据集(大约一百万行),看起来像这样:

apple_key    profit    price    cost    months    date      
golden_d     0.03      12       4       3         01/12
golden_d     0.03      8        0       2         01/12
granny_s     0.05      15       5       5         02/12
red_d        0.04      13       0       1         01/12
golden_d     0.02      1        2       12         03/14

在此数据集上,我正在运行以下查询:

%let picking_date = 01/12; /* I simplify here - this part of my code definitely works */

proc sql; 
    CREATE TABLE output AS 
    SELECT 
        (CASE apple_key
              WHEN "golden_d" THEN 1
              WHEN "granny_s" THEN 2
              WHEN "red_d"    THEN 3
        END) AS apple_id,
        apple_key AS apple_name,
        (CASE WHEN cost= 0 THEN 0 
            ELSE 1 
        END) AS cost_flag,
        (CASE 
            WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
            ELSE 5
        END) AS age, 
        "McDonalds" as farm, 
        sum(profit*price)/sum(price) as price_weighted_profit
    FROM input_table
    WHERE date = "&picking_date."d
        AND price > cost
        AND cost >= 0
        AND cost >= 0
    GROUP BY apple_id, apple_name, cost_flag, age, farm
    ; 
run; 

运行此命令时, GROUP BY语句不起作用。 我为单个组获得了很多条目(apple_id,apple_name,cost_flag,age和farm都相同,但是我的聚合无法正常工作)。

但是,当我分别运行GROUP BY时(如下所示),一切正常。 对于每个组,我都有一个“价格加权利润”条目:

proc sql; 
    CREATE TABLE output_tmp AS 
    SELECT 
        (CASE apple_key
              WHEN "golden_d" THEN 1
              WHEN "granny_s" THEN 2
              WHEN "red_d"    THEN 3
        END) AS apple_id,
        apple_key AS apple_name,
        (CASE WHEN cost= 0 THEN 0 
            ELSE 1 
        END) AS cost_flag,
        (CASE 
            WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
            ELSE 5
        END) AS age, 
        "McDonalds" as farm
    FROM input_table
    WHERE date = "&picking_date."d
        AND price > cost
        AND cost >= 0
        AND cost >= 0
   ;

    CREATE TABLE output AS
    SELECT 
        apple_id, 
        apple_name, 
        cost_flag, 
        age, 
        farm,
        sum(profit*price)/sum(price) as price_weighted_profit
    FROM output_tmp
    GROUP BY apple_id, apple_name, cost_flag, age, farm
    ;
quit;

为什么会这样呢? 我该如何解决? 这让我有点疯狂...谢谢您的帮助

这是行不通的,因为group by没有将sum(profit * price)/ sum(price)语句作为聚合函数。 由于使用了诸如age,cost_flag等别名,因此无法执行此操作

无论如何,以下是正确的查询:

 Proc sql;
    CREATE TABLE output AS 
     SELECT 
            apple_id, 
            apple_name, 
            cost_flag, 
            age, 
            farm, 
            sum(profit*price)/sum(price) as price_weighted_profit
        FROM
       (
        SELECT 
            (CASE apple_key
                  WHEN "golden_d" THEN 1
                  WHEN "granny_s" THEN 2
                  WHEN "red_d"    THEN 3
            END) AS apple_id,
            apple_key AS apple_name,
            (CASE WHEN cost= 0 THEN 0 
                ELSE 1 
            END) AS cost_flag,
            (CASE 
                WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
                ELSE 5
            END) AS age, 
            "McDonalds" as farm
        FROM input_table
        WHERE date = "&picking_date."d
            AND price > cost
            AND cost >= 0
            AND cost >= 0

        ) a
        GROUP BY apple_id, apple_name, cost_flag, age, farm;
        quit;

如果您有任何疑问,请告诉我

经验法则:-每当在select子句中使用任何聚合函数时,其余列均应属于group by的一部分。 在您发布的问题中,您应用总和(利润*价格)/总和(价格),但是没有哪个组引起问题。

Proc sql;
    CREATE TABLE output AS 
        SELECT 
            (CASE apple_key
                  WHEN "golden_d" THEN 1
                  WHEN "granny_s" THEN 2
                  WHEN "red_d"    THEN 3
            END) AS apple_id,
            apple_key AS apple_name,
            (CASE WHEN cost= 0 THEN 0 
                ELSE 1 
            END) AS cost_flag,
            (CASE 
                WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
                ELSE 5
            END) AS age, 
            "McDonalds" as farm, 
            sum(profit*price)/sum(price) as price_weighted_profit
        FROM input_table
        WHERE date = "&picking_date."d
            AND price > cost
            AND cost >= 0
            AND cost >= 0    
        GROUP BY apple_id, apple_name, cost_flag, age, farm;
        quit;

我怀疑正在发生的事情正在重演 SAS proc sql接受如下代码:

proc sql;
    select a.*, count(*)
    from a;

这不会汇总数据。 而是将总计数放在每一行上。 换句话说,如果select的键与group by不完全匹配,则将基于group by键计算聚合函数,但是结果将放到各个行中。 其他数据库使用窗口功能的子集来执行此操作。

就您而言,重新合并并不明显。 我认为这里存在关键的困惑,因为您在select中使用的名称与原始数据中的名称相同。 我的建议是更改别名,以使它们无歧义,并确保group by中的键无歧义。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM