简体   繁体   English

Bigquery:按_PARTITIONTIME过滤不会在LEFT JOIN上传播

[英]Bigquery: Filter by _PARTITIONTIME doesn't propagate on LEFT JOIN

I have 2 partitioned tables: 我有2个分区表:

Table 1: 表格1:


|user_id|request_id| | user_id | request_id |


Table 2: 表2:


|ip|user_id|request_id| | ip | user_id | request_id |


I want to get for all ips from partition_table2: - users count(from partition_table1) - users requests(from partition_table1) - user requests(from partition_table2) for users(from partition_table1) 我想获取来自partition_table2的所有ip:-用户计数(来自partition_table1)-用户请求(来自partition_table1)-用户请求(来自partition_table2)针对用户(来自partition_table1)

Info: Ip is related to request_id from Table 1, because one user_id can have more than one ip. 信息:IP与表1中的request_id有关,因为一个user_id可以具有多个IP。

Issue: When I filter by _PARTITIONTIME in the main query it doesn't propagate to query from WITH when I do LEFT JOIN, but filter by _PARTITIONTIME is propagated when I do INNER JOIN. 问题:当我在主查询中按_PARTITIONTIME进行过滤时,当我进行LEFT JOIN时,不会传播到WITH进行查询,但是当我进行INNER JOIN时,将通过_PARTITIONTIME进行过滤。

Partition pruning doesn't seem to work: https://cloud.google.com/bigquery/docs/querying-partitioned-tables for LEFT JOIN 分区修剪似乎不起作用:LEFT JOIN的https://cloud.google.com/bigquery/docs/querying-partitioned-tables

My Query: 我的查询:

WITH
  users_info AS (
  SELECT
    t2.ip,
    t1.user_id,
    COUNT(DISTINCT t1.request_id) AS user_requests,
    t1._PARTITIONTIME AS date
  FROM partitioned_table1 t1
  INNER JOIN partition_table2 t2
    ON t1.request_id = t2.request_id
    AND t1._PARTITIONTIME = t2._PARTITIONTIME
  GROUP BY t2.ip, t1.user_id, t1._PARTITIONTIME
  )
SELECT
  t2.ip,
  COUNT(DISTINCT m.user_id) AS users,
  COUNT(DISTINCT t2.request_id) AS t2_users_requests,
  SUM(m.user_requests) AS t1_users_requests
FROM partition_table2 t2
LEFT JOIN/INNER JOIN users_info m
  ON t2.ip=m.ip
  AND t2.user_id=m.user_id
  AND m.date = t2._PARTITIONTIME
WHERE DATE(t2._PARTITIONTIME) = "2019-05-20" 
GROUP BY t2.ip

If I do INNER JOIN this query processes ~4 GB, but with LEFT JOIN it processes ~3 TB 如果我执行INNER JOIN,则此查询处理〜4 GB,但是使用LEFT JOIN处理此查询〜3 TB

I did something wrong or is this behaviour expected? 我做错了,还是这种行为预期?


EDIT 编辑

I need this query to create a VIEW. 我需要此查询来创建一个VIEW。 Condition(DATE(t2._PARTITIONTIME) = "2019-05-20") from the above query I'll use to filter the VIEW when I'll query it. 来自上述查询的Condition(DATE(t2._PARTITIONTIME)=“ 2019-05-20”)我将在查询时使用它来过滤VIEW。

The columns from the right side of a LEFT OUTER JOIN can potentially be NULL, so yes, BigQuery actually needs to execute the join to figure out the results rather than filtering partitions in advance. LEFT OUTER JOIN右侧的列可能为NULL,因此,是的,BigQuery实际上需要执行连接以找出结果,而不是预先过滤分区。 If you don't want this behavior, use a subquery where you filter on _PARTITIONTIME prior to the join. 如果您不希望出现这种情况,请使用子查询在_PARTITIONTIME之前在_PARTITIONTIME进行过滤。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM