简体   繁体   中英

How do I get the last update time of a sequence of tables in BigQuery?

A BigQuery best practice is to split timeseries in daily tables (as "NAME_yyyyMMdd") and then use Table Wildcards to query one or more of these tables.

Sometimes it is useful to get the last update time on a certain set of data (ie to check correctness of the ingestion procedure). How do I get the last update time over a set of tables organized like that?

A good way to achieve that is to use the __TABLES__ meta-table. Here is a generic query I use in several projects:

SELECT
  MAX(last_modified_time) LAST_MODIFIED_TIME,
  IF(REGEXP_MATCH(RIGHT(table_id,8),"[0-9]{8}"),LEFT(table_id,LENGTH(table_id) - 8),table_id) AS TABLE_ID
FROM
  [my_dataset.__TABLES__]
GROUP BY
  TABLE_ID

It will return the last update time of every table in my_dataset . For tables organized with a daily-split structure, it will return a single value (the update time of the latest table), with the initial part of their name as TABLE_ID .

SELECT * FROM project_name.data_set_name .INFORMATION_SCHEMA.PARTITIONS where table_name='my_table';

Solution for Google

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM