使用 dbt 进行 redshift 列编码

Question

I am new to using dbt and having trying it out on aws redshift我刚开始使用 dbt 并在 aws redshift 上试用过

Currently I can set the encoding of a column using a create statement outside of dbt as目前我可以使用dbt之外的 create 语句设置列的编码

create table fact_sales (
  id integer,
  date date NOT NULL encode az64...
)

via dbt I am able to control the data type of the project as通过 dbt 我能够控制项目的数据类型

select
  id::integer,
  date::date
FROM stg.sales

Is there a way to set the encode az64 via dbt?有没有办法通过 dbt 设置encode az64 ？

Answer 1

I was able to solve this with the following strategy我能够通过以下策略解决这个问题

define the data type of the column of the model in the CTAS sql query定义CTAS sql查询中model列的数据类型

  # model.sql
   WITH 
    /* transform steps */
   result as (
     /* cast your projections explicitly */
     SELECT
      id::integer,
      date::date
     FROM _intermediate_step_table;
   ),

    SELECT * FROM final

define the encoding of the column as a ALTER COLUMN as part of the post_hooks query将列的编码定义为 ALTER COLUMN 作为post_hooks查询的一部分

Reference: https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook参考： https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook

PROS:优点：

since this is not directly supported via dbt schema.yml as explained in the discussion in the github issue thread this is the only sane way I could do this, without jumping hoops因为这不是通过 dbt schema.yml直接支持的，如github 问题线程中的讨论中所解释的，这是我可以做到这一点的唯一理智的方式，而无需跳圈

CONS:缺点：

if this is large table you would want to avoid this as ALTERing a large table could have performance issues.如果这是一个大表，你会希望避免这种情况，因为改变一个大表可能会有性能问题。

ALTERNATE:备用：

if your model table IS large (billions of rows) you would be anyways using incremental approach of loading the table (I was not running into this due to the size of data I was handling)如果您的 model 表很大（数十亿行），您无论如何都会使用增量加载表的方法（由于我正在处理的数据量，我没有遇到这种情况）

In this case在这种情况下

create the table outside the dbt life cycle.在 dbt 生命周期之外创建表。
use incremental mode to load the table (this would be done anyways as the table is huge and you want performance) which wont recreate the table使用增量模式加载表（无论如何都会这样做，因为表很大并且您需要性能）不会重新创建表

使用 dbt 进行 redshift 列编码

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-12-26 04:02:15

使用 dbt 进行 redshift 列编码

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-12-26 04:02:15

解决方案1
0 已采纳 2022-12-26 04:02:15