简体   繁体   English

在使用 dbt 进行增量时,如果该行存在,我想聚合,否则插入

[英]while doing incremental using dbt i want to to aggregation if that row exist else insert

I am using DBT to incremental load data from one schema in redshift to another to create reports.我正在使用 DBT 将数据从 redshift 中的一个模式增量加载到另一个模式以创建报告。 In DBT there is straight forward way to incrementally load data with upsert.在 DBT 中,有一种直接的方式来使用 upsert 增量加载数据。 But instead of doing the traditional upsert.但不是做传统的upsert。 I want to take sum (on the unique id for the rest of the columns in the table) of the incoming rows and old rows in the destination table if they already exist else do insert them.如果它们已经存在,我想对目标表中的传入行和旧行求和(在表中列的 rest 的唯一 ID 上),否则插入它们。 Say for example I have a table.比如说我有一张桌子。

T1(userid, total_deposit, total_withdrawal)

i have created a table that calculates total deposit and total withdrawal for a user, when i do an incremental query i might get new deposit or withdrawal the for existing user, in that case, I'll have to add the value in existing table instead of replacing it using upsert.我创建了一个计算用户总存款和总取款的表,当我进行增量查询时,我可能会为现有用户获得新的存款或取款,在这种情况下,我将不得不在现有表中添加值使用 upsert 替换它。 And if the user is new I just need to do simple insert.如果用户是新用户,我只需要进行简单的插入即可。 Any suggestion on how to approach this?关于如何解决这个问题的任何建议?

dbt is quite opinionated that invocations of dbt should be idempotent. dbt 认为 dbt 的调用应该是幂等的。 This means that you can run the same command over and over again, and the result will be the same.这意味着您可以一遍又一遍地运行相同的命令,结果将是相同的。

The operation you're describing is not idempotent, so you're going to have a hard time getting it to work with dbt out of the box.您描述的操作不是幂等的,因此您将很难让它与开箱即用的 dbt 一起工作。

As an alternative, I would break this into two steps:作为替代方案,我会将其分为两个步骤:

  1. Build an incremental model, where you are appending the new activity构建增量 model,您将在其中附加新活动
  2. Create a downstream model that references the incremental model and performs the aggregations you want to calculate the balance for each customer.创建引用增量 model 的下游 model 并执行要计算每个客户的余额的聚合。 You could very carefully craft this as an incremental model with your user_id as the unique_key (since you have all of the raw transactions in #1), but I'd start without that and make sure that's absolutely necessary for performance reasons, since it will add a fair bit of complexity.可以非常小心地将其制作为增量 model 并使用您的user_id作为unique_key (因为您在 #1 中拥有所有原始事务),但我会从没有它开始,并确保出于性能原因这是绝对必要的,因为它会增加了相当多的复杂性。

For more info on complex incremental materializations, I suggest this discourse post written by Tristan Handy, Founder & CEO at dbt Labs有关复杂增量实现的更多信息,我建议您阅读 dbt Labs 创始人兼首席执行官 Tristan Handy 撰写的这篇演讲文章

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果行不存在,我如何插入,否则更新该行? - How can I INSERT if row doesn't exist, else UPDATE that row? 如果主题已经存在,我想将总数加一,否则进行正常插入 - I want to increase total by one if the topic already exist else do normal insert 如果已存在则更新行,否则使用 mysql 在表中插入新记录 - update row if already exist or else insert new record in table using mysql 当行不存在时插入表中,否则使用 Oracle SQL 更新表 - Insert into table when row not exist else update table using Oracle SQL 获取数据库表中的数据,如果不存在则将其插入,否则返回行ID - Fetch data in database table insert it if not exist else return the row id SQL更新(如果存在)并插入else并返回该行的键 - SQL update if exist and insert else and return the key of the row 更新(如果存在),否则插入 - Update if exist, else insert 如果存在,则插入PHP - if exist else insert PHP 需要在Sql查询中进行一些修改,我想检查是否存在使用该ID的数据,否则添加row_number - Need little modification in Sql query I want check if data exist use that id else add row_number 使用MERGE进行增量插入 - using MERGE for incremental insert
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM