简体   繁体   English

每天我都会在 BigQuery 中收到一个新表,我想将这个新表数据连接到主表,数据集架构相同

[英]Daily I’m receiving a new table in the BigQuery, I want concatenate this new table data to the main table, dataset schema are same

Daily I'm receiving a new table (example:tablename_20220811) in the BigQuery, I want concatenate this new table data to the main_table, dataset schema are same.每天我都会在 BigQuery 中收到一个新表(例如:tablename_20220811),我想将这个新表数据连接到 main_table,数据集架构相同。

I tried using wild cards,I don't know how to pull the daily loaded table.我尝试使用通配符,我不知道如何拉出每日加载的表。

You can use BigQuery scheduled queries with an interval (cron) in the schedule parameters:您可以在计划参数中使用带有间隔 (cron) 的BigQuery计划查询:

Example with gcloud cli: gcloud cli 示例:

bq query \
  --use_legacy_sql=false \
  --destination_table=mydataset.desttable \
  --display_name='My Scheduled Query' \
  --schedule='every 24 hours' \
  --append_table=true \
  'SELECT
    1
   FROM
    mydataset.tablename_*
    where _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE())'

In order to target on the expected table, I used a wildcard and a filter based on the table suffix.为了以预期的表为目标,我使用了通配符和基于表后缀的过滤器。 The table suffix should be equals to the current date as STRING with the following format yyyymmdd .表后缀应等于当前日期作为STRING ,格式如下yyyymmdd

The cron plan to run the query every day. cron 计划每天运行查询。

You can also configure it directly with the Google Cloud console.您也可以直接使用 Google Cloud 控制台对其进行配置。

It sounds like you have the right naming format for BigQuery to treat your tables as a single 'date-sharded table'.听起来您拥有适合 BigQuery 的正确命名格式,可以将您的表视为单个“日期分片表”。

You need to ensure that the daily tables您需要确保每天的表

  • have the same schema有相同的模式
  • are in the same dataset在同一个数据集中
  • have the same name apart from the _yyyymmdd suffix除了_yyyymmdd后缀外具有相同的名称

You will know if this worked because only one table will appear (with an icon showing multiple tables, rather than the usual icon).您将知道这是否有效,因为只会出现一个表格(带有显示多个表格的图标,而不是通常的图标)。

With this in hand, you can write queries like有了这个,您可以编写查询,如

    SELECT
        fieldA,
        fieldB,
      FROM
       `some_dataset.tablename_*`
     WHERE
        _table_suffix BETWEEN '20220101' AND '20221201'

This gives you some idea of what's possible:这使您对可能发生的事情有所了解:

  • select from the full date-sharded table using backticks (essential!) and the wildcard syntax select 来自使用反引号(必不可少!)和通配符语法的完整日期分片表
  • filter using the special _table_suffix meta-field使用特殊的_table_suffix元字段进行过滤

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将表复制到新的 bigquery 项目中的空数据集? - How copy a table to an empty dataset in a new bigquery project? 我想填充 BigQuery 表数据中有漏洞的行 - I want to fill in rows with holes in BigQuery table data 如何创建一个新表,只保留Bigquery中相同ID下超过5条数据记录的行 - How to create a new table that only keeps rows with more than 5 data records under the same id in Bigquery 如果目标表不是每日分区,则 Bigquery 数据传输失败 - Bigquery data transfer failing if target table is not daily-partitioned 无法将查询结果保存到 BigQuery 中的新表 - Unable to Save Results of Query to a New Table in BigQuery Bigquery 将列添加到表架构 - Bigquery add columns to table schema 如何将多个文件(相同模式)从 LOCAL 加载到 BigQuery 中的表中? - How to load multiple files (same schema) from LOCAL into a table in BigQuery? 如何在 BigQuery 中创建新项目并使用公共数据集创建新数据集 - How can I create new project in BigQuery and create new dataset using public dataset Postgres - 具有 SELECT 的新角色显示空表,而超级用户角色显示同一表中的数据 - Postgres - new role with SELECT shows empty table, whereas superuser role shows data in same table 使用 BigQuery 在分区表中自动添加新行 - Add new rows automatically in Partitioned Table using BigQuery
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM