繁体   English   中英

如何在Amazon redshift中编写此postgres查询,使其与postgres一样优化?

[英]How can I write this postgres query in Amazon redshift such that it is as optimized as it was in postgres?

这是我在postgres中使用的原始查询-

SELECT a.id,
    (SELECT val
       FROM database.detail x
      WHERE name = 'blablah'
        AND x.id = b.id) AS myGroup,
    c.username,
    a.someCode,
    a.timeTaken,
    a.date ::timestamp WITH time ZONE AT time ZONE 'PST' AS date,
    SUM (CASE WHEN (b.name = 'name1') THEN b.val ::INTEGER ELSE 0 END ) AS name11,
    SUM (CASE WHEN (b.name = 'name2') THEN b.val ::INTEGER ELSE 0 END ) AS name12
FROM
    database.myTable a,
    database.detail b,
    database.client c
WHERE
    a.id = b.id
    AND a.c_id = c.c_id
    AND a.date > current_date - interval '2 weeks'
GROUP BY 1, 2, 3, 4, 5, 6

以下是我如何将此查询转换为Amazon redshift查询。

SELECT a.id,
    b.val AS myGroup,
    c.username,
    a.someCode,
    a.timeTaken,
    convert_timezone('PST', a.date) AS date,
    SUM (CASE WHEN (b.name = 'name1') THEN b.val ::INTEGER ELSE 0 END ) AS name11,
    SUM (CASE WHEN (b.name = 'name2') THEN b.val ::INTEGER ELSE 0 END ) AS name12
FROM
    database.myTable a,
    database.detail b,
    database.client c
WHERE
    a.id = b.id
    AND b.name = 'blablah'
    AND a.c_id = c.c_id
    AND a.date > current_date - interval '2 weeks'
GROUP BY 1, 2, 3, 4, 5, 6 LIMIT 10

CASE语句似乎没有按照预期的方式执行,基本上name11和name12的值都为零。 我的postgres查询返回了这些的有效值,但redshift查询却没有。

另外,此查询非常慢。 Postgres查询大约需要150毫秒,而此查询则需要2分钟。

我们怎样才能做得更好?

Redshift查询优化来自表的群集,表设计,数据加载,数据清理和分析。

让我回答上面列表中的一些核心接触点。 1.确保您的表可更改,详细,客户端具有正确的SORT_KEY,DIST_KEY。2.确保您对联接中的所有表进行正确的分析和清理。

这是用Redshift格式编写的同一SQL的另一个版本。

我做了几项调整

  1. 使用“带有子句”来优化群集级别的计算
  2. 使用了正确的联接方式,并确保基于数据的左/右联接很重要。
  3. 使用date_range和子句表可实现面向对象的种类。
  4. 在下面的主要SQL中使用了分组依据。

我的Redshift SQL版本

/** Date Range Computation **/
with date_range as (
    select ( current_Date - interval '2 weeks' ) as two_weeks
),
/** Filter main ResultSet**/
myGroupSet as (
    SELECT b.val AS myGroup,
           c.username,
           a.someCode,
           a.timeTaken,
           (case when (b.name == 'name1') THEN b.val::INTEGER ELSE 0 END ) as name11,
           (case when (b.name == 'name2') THEN b.val::INTEGER ELSE 0 END ) as name12
      FROM database.myTable a,
      join date_range dr on a.date > dr.two_weeks
      join database.detail b on b.id = a.id
      join database.client c on c.c_id = a.c_id
     where a.date > current_Date - interval '2 weeks'
)
/** Apply Aggregation **/
select myGroup, username, someCode, timeTaken, date,
       sum(name1), sum(name2)
  from myGroupSet
  group by myGroup, username, someCode, timeTaken, date

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM