简体   繁体   English

合并 2 个分区表 BigQuery

[英]Merging 2 partitioned tables BigQuery

I am trying to merge 2 partitioned tables in BigQuery:我正在尝试在 BigQuery 中合并 2 个分区表:

  • ' source_t ' is a source table. ' source_t ' 是源表。 Its partitioned by Ingestion Time with Partition filter –它通过分区过滤器按摄取时间进行分区 -

    Required. Pseudo field _PARTITIONTIME is timestamp

  • ' target_t ' is a target table partitioned by field 'date' with Partition filter target_t ”是一个目标表,由字段“日期”和分区过滤器分区

    Required. Field date is date

I want to get data from last partition of source table and merge it to target table.我想从表的最后一个分区获取数据并将其合并到目标表。 To filter the search task on tagret table I need to use the field 'date' from the data of source table.要过滤tagret表上的搜索任务,我需要使用表数据中的字段“日期”。 I wrote a query but editor shows following query error:我写了一个查询,但编辑器显示以下查询错误:

Cannot query over table 'MyDataSet.target_t' without a filter over column(s) 'date'如果不对列“日期”进行筛选,则无法查询表“MyDataSet.target_t”

Here is my query:这是我的查询:

declare latest default (select date(max(_PARTITIONTIME)) as latest from MyDataSet.source_t where _PARTITIONTIME >= timestamp(date_sub(current_date(),interval 1 day))); 
declare first_date default (select min(date) as first_date from MyDataSet.source_t where date(_PARTITIONTIME) = latest);
merge `MyDataSet.target_t` as a
using (select * from `MyDataSet.source_t` where _PARTITIONTIME = latest) as b 
on
  a.date >= first_date and
  a.date = b.date and
  a.account_id = b.account_id and 
  a.campaign_id = b.campaign_id and 
  a.adset_id = b.adset_id and 
  a.ad_id = b.ad_id 
when matched then update set 
  a.account_name = b.account_name, 
  a.campaign_name = b.campaign_name, 
  a.adset_name = b.adset_name, 
  a.ad_name = b.ad_name, 
  a.impressions = b.impressions, 
  a.clicks = b.clicks, 
  a.spend = b.spend, 
  a.date = b.date 
when not matched then insert row;

If I input date instead of 'latest' variable (" where _PARTITIONTIME = '2020-10-01') as b ") there wont be any error.如果我输入日期而不是“最新”变量 (" where _PARTITIONTIME = '2020-10-01') as b ") 则不会出现任何错误。 But I want to filter the source table properly.但我想正确过滤源表。 And I don't get it how it affects the following 'on' statement and why everything brokes >.< Could you please help?而且我不明白它如何影响以下“on”语句以及为什么一切都坏了 >.< 你能帮忙吗? What is a proper syntax to write such query.编写此类查询的正确语法是什么。 And is there any other ways to run such merge without variables?还有其他方法可以在没有变量的情况下运行这种合并吗?

declare latest timestamp;声明最新的时间戳;

Your variable latest is a TIMESTAMP.您的latest变量是时间戳。 Making it a DATE type then your query should work.将其设为 DATE 类型,那么您的查询应该可以工作。

------ Update -------- - - - 更新 - - - -

The error is complaining about MyDataSet.target_t doesn't have a good filter on date column.错误是抱怨 MyDataSet.target_t 在日期列上没有很好的过滤器。 Could you try put after on clause a.date = latest (if this is not the right filter, come up with other constant filter)您可以尝试on子句a.date = latest之后放置(如果这不是正确的过滤器,请提出其他常量过滤器)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM