简体   繁体   中英

choose latest partition of a Bigquery table where filter over partition column is required

I have been using the following query

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) 

It is not ideal as the partition could be unavailable due to delay.. Thus I try the following queries

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) IN 
    (
       SELECT 
         MAX(DATE(_PARTITIONTIME)) AS max_partition
       FROM `project.dataset.table`
       WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    )

as well as

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE TIMESTAMP(DATE(_PARTITIONTIME)) IN 
    (
      SELECT parse_timestamp("%Y%m%d", MAX(partition_id))
      FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
      WHERE table_name = 'table'
    )

Neither of them work due to

Cannot query over table 'project.dataset.table' without a filter over column(s) '_PARTITION_LOAD_TIME', '_PARTITIONDATE', '_PARTITIONTIME' that can be used for partition elimination.

In both of your solutions the limiting filter for the partition column is calculated during the query. This lead to full table scan.

Therfore, you need to add a filter for the partition column which is always know at the beginning of the run of your query.

SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) IN 
    (
       SELECT 
         MAX(DATE(_PARTITIONTIME)) AS max_partition
       FROM `project.dataset.table`
       WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    )
AND DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)

If the last partition date could be months back, this is a better solution:


Declare max_date date;

execute immediate
"""
SELECT max(date(_PARTITIONTIME)) FROM `project.dataset.table`
WHERE DATE(_PARTITIONTIME) > "2000-12-15"  
""" into max_date;

execute immediate
"""
Select * from `project.dataset.table`  where date(_PARTITIONTIME)= date('"""  || max_date || "')"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM