繁体   English   中英

标准 SQL:将显式交叉连接重写为 WITH 子句

[英]Standard SQL: rewrite an explicit cross join to WITH clause

考虑一个具有两个 JSONB 字段outputsinputs的表transactions 问题是如何使用 WITH 子句重写此查询?

-- Note: This query will process 111.85 MB when run.
SELECT
    transactions.hash AS CREATED_TX_HASH,
    transactions.block_number AS CREATED_BLOCK_ID,
    transactions.block_timestamp AS CREATED_BLOCK_TIME,
    outputs.index AS CREATED_INDEX,
    outputs.value / 1e8 AS OUTPUT_VALUE_BTC,
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions
CROSS JOIN
    transactions.outputs as outputs
CROSS JOIN
    transactions.inputs as inputs
-- FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
--     transactions.outputs as outputs,
--     transactions.inputs as inputs   
WHERE transactions.block_timestamp_month < '2009-02-01' 
ORDER BY 3

我需要的是创建 CTE 以保留临时结果集,如下所示:

WITH outputs AS (
  SELECT
      transactions.hash AS CREATED_TX_HASH,
      transactions.block_number AS CREATED_BLOCK_ID,
      transactions.block_timestamp AS CREATED_BLOCK_TIME,
      outputs.index AS CREATED_INDEX,
      outputs.value / 1e8 AS OUTPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.outputs as outputs
  WHERE transactions.block_timestamp_month < '2009-02-01'  

), inputs AS (

  SELECT
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.inputs as inputs
  WHERE transactions.block_timestamp_month < '2009-02-01'
)

但我不知道这两个 CTE 上的哪个SELECT语句产生与上面的原始查询相同的结果。

您需要通过CREATED_BLOCK_IDSPENDING_BLOCK_ID加入它们,此外,我使用ROW_NUMBER语句来避免重复值。

下面的查询应该适合你:

    WITH outputs AS (
  SELECT
      transactions.hash AS CREATED_TX_HASH,
      transactions.block_number AS CREATED_BLOCK_ID,
      transactions.block_timestamp AS CREATED_BLOCK_TIME,
      outputs.index AS CREATED_INDEX,
      outputs.value / 1e8 AS OUTPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.outputs as outputs
  WHERE transactions.block_timestamp_month < '2009-02-01'  

), inputs AS (

  SELECT
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.inputs as inputs
  WHERE transactions.block_timestamp_month < '2009-02-01'
)
SELECT * from 
(
  SELECT * , 
  ROW_NUMBER() OVER(PARTITION BY CREATED_BLOCK_ID, CREATED_INDEX, SPENDING_BLOCK_ID, SPENT_CREATED_INDEX, CREATED_TX_HASH, SPENT_CREATED_TX_HASH
                    ORDER BY CREATED_BLOCK_TIME DESC) as last
   from outputs o join inputs i 
on  o.CREATED_BLOCK_ID=SPENDING_BLOCK_ID 
order by o.CREATED_BLOCK_ID, o.CREATED_BLOCK_TIME, o.CREATED_INDEX, o.CREATED_TX_HASH  
)
WHERE last = 1 AND CREATED_TX_HASH = SPENT_CREATED_TX_HASH

output 看起来像:

在此处输入图像描述

最后,我建议您使用CROSS JOIN查询,因为这个 function 比使用WITH子句的子查询具有更好的性能。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM