简体   繁体   English

如何获得 Google Big Query SQL 中整个列的最接近匹配 vlookup?

[英]How do I get the closest match vlookup for entire column in Google Big Query SQL?

I am trying to take a column of original prices and enter a discount % and return the closest match to a predetermined set of values.我正在尝试获取一列原始价格并输入折扣百分比并将最接近的匹配项返回到一组预定值。 These allowable values are found in another table that is just one column of prices.这些允许的值可以在另一个表中找到,该表只是一列价格。 I am curious to hear how ties would be handled.我很想知道领带是如何处理的。 Please note that this is for a long list of items, so this would have to apply to an entire column.请注意,这是针对一长串项目,因此这必须应用于整个列。 The specific syntax needed is Google Big Query.所需的特定语法是 Google Big Query。

I envision this functioning similarly to excel's VLOOKUP approximate = 1. In practice, I will apply the same solution to multiple price points in the results table (ex. origPrice, 25%off, 50%off, and 75%off etc. ), but I figured that I could copy-paste the solution multiple times.我设想此功能类似于 excel 的 VLOOKUP approximate = 1。实际上,我将对结果表中的多个价格点应用相同的解决方案(例如 origPrice、25%off、50%off 和 75%off 等),但我想我可以多次复制粘贴解决方案。

The below example shows a 50% price reduction.下面的例子显示了 50% 的降价。

allowableDiscounts允许折扣

discountPrice折扣价
$51.00 $51.00
$48.50 48.50 美元
$40.00 40.00 美元

productInfo产品信息

Item物品 OrigPrice原价
Apple苹果 $100.00 100.00 美元
Banana香蕉 $ 98.00 $ 98.00

Desired Output所需 Output

Item物品 OrigPrice原价 exact50off exact50off closestMatch最接近匹配
Apple苹果 $100.00 100.00 美元 $50.00 50.00 美元 $51.00 $51.00
Banana香蕉 $ 98.00 $ 98.00 $44.00 $44.00 $40.00 40.00 美元

I have researched solutions here and elsewhere.我在这里和其他地方研究了解决方案。 Most of what I found suggested sorting the allowableDiscounts table by the absolute value of the difference between exact50off and discountPrice.我发现的大部分内容都建议按 exact50off 和 discountPrice 之间的差值的绝对值对 allowableDiscounts 表进行排序。 That worked great for one instance, but I could not figure out how to apply that to an entire list of prices.这在一个实例中非常有效,但我无法弄清楚如何将其应用于整个价格列表。

I have workarounds both in SQL and excel that can accomplish the same task manually, but I am looking for something to match the above function so that way if the allowableDiscounts table changes, the calculations will reflect that without recoding.我在 SQL 和 excel 中都有解决方法,可以手动完成相同的任务,但我正在寻找与上面的 function 相匹配的东西,这样如果 allowableDiscounts 表发生变化,计算将反映出来而无需重新编码。

SELECT
   p.Item,
   p.OrigPrice,
   p.OrigPrice * 0.5 AS exact50off
   --new code from allowableDiscounts.discountPrice
FROM
   productInfo AS p
WHERE
   --filters applied as needed

You may work it out with a CROSS JOIN , then compute the smallest difference and filter out the other generated records (with higher differences).您可以使用CROSS JOIN来解决它,然后计算最小的差异并过滤掉其他生成的记录(具有更高的差异)。

Smallest difference here is retrieved by assigning a rank to all differences in each partition <Item, OrigPrice> (with ROW_NUMBER ), then all values ranked higher than 1 are discarded.通过为每个分区 <Item, OrigPrice> (使用ROW_NUMBER )中的所有差异分配一个等级来检索此处的最小差异,然后丢弃所有排名高于 1 的值。

WITH cte AS (
    SELECT *,
           OrigPrice*0.5 AS exact50off,
           ROW_NUMBER() OVER(PARTITION BY Item, OrigPrice ORDER BY ABS(discountPrice - OrigPrice*0.5)) AS rn
    FROM productInfo
    CROSS JOIN allowableDiscounts
)
SELECT Item, 
       OrigPrice, 
       exact50off,
       discountPrice
FROM cte
WHERE rn = 1

Use the ABS(X) function to compute the absolute values between the columns in the tables to make a match as an exact match or a difference in values between 1 and 4 for the various discount values as below, use a LEFT JOIN to get allow values in your leading table productInfo and either matching values or NULL from the allowableDiscounts table.使用 ABS(X) function 来计算表中各列之间的绝对值,以进行匹配作为完全匹配或 1 和 4 之间的差值对于各种折扣值,如下所示,使用 LEFT JOIN 获得允许前导表 productInfo 中的值以及 allowableDiscounts 表中的匹配值或 NULL。

SELECT
   p.Item,
   p.OrigPrice,
   p.OrigPrice * 0.5 AS exact50off,
   p.OrigPrice * 0.25 AS exact25off,
   p.OrigPrice * 0.75 AS exact75off,
   q.discountPrice AS closestMatch
FROM
   productInfo AS p
JOIN allowableDiscounts q on ABS(p.OrigPrice * 0.50 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.50 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) = 0 
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) BETWEEN 0.01 AND 4.0;

In case the tables are large, as you stated, a cross join is not possible and a window function is the only solution.如果表很大,如您所述,则无法进行交叉连接,而 window function 是唯一的解决方案。

First we generate a function nearest , which return the element (x or y) closest to a target value.首先我们生成一个 function nearest ,它返回最接近目标值的元素(x 或 y)。

Then we define both tables, discountPrice and productInfo.然后我们定义两个表,discountPrice 和 productInfo。 Next, we union these tables as helper .接下来,我们将这些表合并为helper The first column tmp holds the value 1 , if the data is from the main table productInfo and we calculate the column exact50off .如果数据来自主表productInfo并且我们计算列exact50off ,则第一column tmp的值为1 For the table discountPrice the tmp column in set to 0 and the exact50off column is filled with the entries discountPrice .对于表 discountPrice, tmp列设置为0exact50off列填充条目discountPrice We add the table discountPrice again, but for column exact75off.我们再次添加表 discountPrice,但针对列 exact75off。

We query the helper table and use:我们查询helper表并使用:

last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
  • tmp=0 : Keep only entries from the table discountPrice tmp=0 :仅保留表 discountPrice 中的条目
  • last_value get nearest lowest value from table discountPrice last_value从表 discountPrice 中获取最接近的最低值

We run the same again, but with desc to obtain the nearest highest value.我们再次运行相同的程序,但使用desc来获得最接近的最高值。

The function nearest yields the nearest values of both. nearest的 function 会产生两者最接近的值。

Analog this is done for exact75off模拟这是为exact75off完成的

create temp function nearest(target any type,x any type, y any type) as (if(abs(target-x)>abs(target-y),y,x) );

with allowableDiscounts as (select * from  unnest([51,48.5,40,23,20]) as discountPrice ),
productInfo as (select "Apple" as item, 100 as OrigPrice union all select "Banana",98 union all select "Banana cheap",88),

helper as (
  select 1 as tmp, # this column holds the info from which table the rows come forme
  item,OrigPrice, # all colummns of the table productInfo (2)
  OrigPrice/2 as exact50off, # calc 50%
  OrigPrice*0.25 as exact75off, # calc 75%
   from  productInfo
  
  union all # table for 50%
  select 0 as tmp, 
  null,null, # (2) null entries, because the table productInfo has two columns (2)
  discountPrice as exact50off, #possible values for 50% off
  null # other calc (75%)
  from  allowableDiscounts 

  union all # table for 75%
  select 0 as tmp, 
  null,null, # (2) null entries, because the table productInfo has two columns (2)
  null, # other calc (50%)
  discountPrice, #possible values for 75% off  
  from  allowableDiscounts 
  
  )

  select *,
  nearest(exact50off,
  last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
  last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off desc)
  ) as  closestMatch50off,
  nearest(exact75off,
  last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off),
  last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off desc)
  ) as  closestMatch75off,

  from helper
 qualify tmp=1
  order by exact50off

Yet another approach另一种方法

create temp function vlookup(data array<float64>, key float64) 
returns string language js as r'''
  closestMatch = null;
  closestDifference = Number.MAX_VALUE;

  for (let i = 0; i < data.length; i++) {
    difference = Math.abs(data[i] - key);
    if (difference < closestDifference) {
      closestMatch = data[i];
      closestDifference = difference;
    }
  }
  return closestMatch;
''';
with priceOffList as (
  select *
  from unnest([25, 50, 75]) off
)
select * from (
  select Item, OrigPrice,  off, offPrice, vlookup(arr, offPrice) as closestMatch
  from productInfo,(select array_agg(discountPrice order by discountPrice) arr from allowableDiscounts), priceOffList
  ,unnest([OrigPrice * off / 100]) as offPrice
)
pivot (any_value(offPrice) offPrice, any_value(closestMatch) closestMatch for off in (25, 50, 75))       

if applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何提取 0 后的字符串:在大查询 sql - How do I extract the string after 0: in big query sql BigRQuery 如何更新谷歌大查询表中的列? - BigRQuery How to update a column in a table on google big query? 使用谷歌大查询 sql 将一列中的字符串拆分为多列而不断词 - Using Google big query sql split the string in a column to multiple columns without breaking words 如何检查表是否存在于谷歌大查询中? - How to check if table exists in google big query? 我想在谷歌大查询中将格式为“2019-06-24T22:17:05.000Z”的日期时间列转换为 PST - I want to convert datetime column with the format of "2019-06-24T22:17:05.000Z" to PST in google big query 从 Big Query 数据集中的所有表中获取特定列数据 - Get specific column data from all tables in Big Query datasets 如何将大查询中的表作为 pandas dataframe 保存到 colab 中? - How do I save a table from big query into colab as a pandas dataframe? 如何创建一个 SQL 查询,该查询返回超过一周的列过滤条目? - How do I create a SQL Query that returns a column filtering entries that are more than a week old? Big Query 匹配表之间的记录 - Big Query match records between tables 如何根据字段日期获取大查询的最新记录 - How to get the latest record on big query based on field date
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM