简体   繁体   English

INSERT INTO table SELECT Redshift 超慢

[英]INSERT INTO table SELECT Redshift super slow

We have a large table, that we need to do a DEEP COPY on it.我们有一张大桌子,我们需要对其进行深度复制。 Since we don't have enough empty disk space to make it in one statements I've tried to make it in batches.由于我们没有足够的空磁盘空间来在一个语句中创建它,因此我尝试批量创建它。 But the batches seem to run very very slowly.但是批次似乎运行得非常非常缓慢。

I'm running something like this:我正在运行这样的东西:

   INSERT INTO new_table 
   SELECT * FROM old_table 
    WHERE creation_date between '2018-01-01' AND '2018-02-01'

Even though the query returns small amount of lines ~ 1K即使查询返回少量行 ~ 1K

SELECT * FROM old_table 
WHERE creation_date between '2018-01-01' AND '2018-02-01'
  • The INSERT query take around 50 minutes to complete. INSERT查询大约需要 50 分钟才能完成。

  • The old_table has ~286M rows and ~400 columns old_table有 ~286M 行和 ~400 列

  • creation_date is one of the SORTKEY s creation_dateSORTKEY之一

Explain plan looks like:解释计划看起来像:

XN Seq Scan on old_table  (cost=0.00..4543811.52 rows=178152 width=136883)
      Filter: ((creation_date <= '2018-02-01'::date) AND (creation_date >= '2018 01-01'::date))

My question is:我的问题是:

  • What may be the reason for INSERT query to take this long? INSERT查询花费这么长时间的原因可能是什么?

In my opinion, following are two possibilities--- though if you could add more details to your question will be great.在我看来,以下是两种可能性——不过,如果您能在问题中添加更多细节,那就太好了。

  1. As @John stated in comments, your SORTKEY matters a lot in RedShift, is creation_date sortkey?正如@John 在评论中所说,您的 SORTKEY 在 RedShift 中很重要,是creation_date排序键?
  2. Did you do lot of updates to your old_table , if so, you must to vacuum first do VACUUM DELETE Only old_table then, do select queries.您是否对old_table进行了大量更新,如果是这样,您必须先真空执行VACUUM DELETE Only old_table然后,执行选择查询。

Other option, you might be doing S3 way, but not sure do you want to do it.其他选项,您可能正在使用 S3 方式,但不确定您是否想要这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM