简体   繁体   English

BigQuery 模糊匹配加入或使用范围

[英]BigQuery Fuzzy Match Join Or Using A Range

In Big Query is there a way in a join to use a fuzzy match, or match across a range of values possibly using a regular expression?在 Big Query 中,有没有办法在连接中使用模糊匹配,或者可能使用正则表达式来匹配一系列值?

For example, I have the following query where the "duration" value may differ by +/- 30 so if callhistory.duration = 268 then it would match to calltracking.duration = 292 which falls within the specified range of 238 to 298.例如,我有以下查询,其中“持续时间”值可能相差 +/- 30,因此如果 callhistory.duration = 268 那么它会匹配 calltracking.duration = 292,后者在 238 到 298 的指定范围内。

select 
calltracking.date,
calltracking.calling_phone_number,
calltracking.duration,
callhistory.row_date,
callhistory.callid,
callhistory.calling_pty,
callhistory.duration,
calltracking.start_time_utc,
callhistory.segstart_utc


from

(SELECT 
cast(date(start_time_local) as string) as date,
calling_phone_number,
start_time_utc,
duration,
utm_medium,
utm_source
FROM [xxx:calltracking.calls]) calltracking

left join 

(select 
 *
 FROM [xxx:datamart.callhistory]) callhistory

on (callhistory.calling_pty = calltracking.calling_phone_number) and 
(callhistory.row_date = calltracking.date) and (callhistory.duration = 
calltracking.duration)

Below simplified example is for BigQuery Standard SQL下面的简化示例适用于 BigQuery 标准 SQL

#standardSQL
WITH `xxx.calltracking.calls` AS (
  SELECT 1 id, 292 duration
), `xxx:datamart.callhistory` AS (
  SELECT 2 id, 268 duration 
)
SELECT 
  t.id tid, 
  t.duration tduration,
  h.id hid,
  h.duration hduration
FROM `xxx.calltracking.calls` t
LEFT JOIN `xxx:datamart.callhistory` h
ON t.duration BETWEEN h.duration - 30 AND h.duration + 30  

Note: this will not work with BigQuery #legacySQL which looks like you are using in your question注意:这不适用于 BigQuery #legacySQL,它看起来像是您在问题中使用的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM