[英]How do I get the closest match vlookup for entire column in Google Big Query SQL?
I am trying to take a column of original prices and enter a discount % and return the closest match to a predetermined set of values.我正在尝试获取一列原始价格并输入折扣百分比并将最接近的匹配项返回到一组预定值。 These allowable values are found in another table that is just one column of prices.
这些允许的值可以在另一个表中找到,该表只是一列价格。 I am curious to hear how ties would be handled.
我很想知道领带是如何处理的。 Please note that this is for a long list of items, so this would have to apply to an entire column.
请注意,这是针对一长串项目,因此这必须应用于整个列。 The specific syntax needed is Google Big Query.
所需的特定语法是 Google Big Query。
I envision this functioning similarly to excel's VLOOKUP approximate = 1. In practice, I will apply the same solution to multiple price points in the results table (ex. origPrice, 25%off, 50%off, and 75%off etc. ), but I figured that I could copy-paste the solution multiple times.我设想此功能类似于 excel 的 VLOOKUP approximate = 1。实际上,我将对结果表中的多个价格点应用相同的解决方案(例如 origPrice、25%off、50%off 和 75%off 等),但我想我可以多次复制粘贴解决方案。
The below example shows a 50% price reduction.下面的例子显示了 50% 的降价。
allowableDiscounts允许折扣
discountPrice![]() |
---|
$51.00 ![]() |
$48.50 ![]() |
$40.00 ![]() |
productInfo产品信息
Item![]() |
OrigPrice![]() |
---|---|
Apple![]() |
$100.00 ![]() |
Banana![]() |
$ 98.00 ![]() |
Desired Output所需 Output
Item![]() |
OrigPrice![]() |
exact50off ![]() |
closestMatch![]() |
---|---|---|---|
Apple![]() |
$100.00 ![]() |
$50.00 ![]() |
$51.00 ![]() |
Banana![]() |
$ 98.00 ![]() |
$44.00 ![]() |
$40.00 ![]() |
I have researched solutions here and elsewhere.我在这里和其他地方研究了解决方案。 Most of what I found suggested sorting the allowableDiscounts table by the absolute value of the difference between exact50off and discountPrice.
我发现的大部分内容都建议按 exact50off 和 discountPrice 之间的差值的绝对值对 allowableDiscounts 表进行排序。 That worked great for one instance, but I could not figure out how to apply that to an entire list of prices.
这在一个实例中非常有效,但我无法弄清楚如何将其应用于整个价格列表。
I have workarounds both in SQL and excel that can accomplish the same task manually, but I am looking for something to match the above function so that way if the allowableDiscounts table changes, the calculations will reflect that without recoding.我在 SQL 和 excel 中都有解决方法,可以手动完成相同的任务,但我正在寻找与上面的 function 相匹配的东西,这样如果 allowableDiscounts 表发生变化,计算将反映出来而无需重新编码。
SELECT
p.Item,
p.OrigPrice,
p.OrigPrice * 0.5 AS exact50off
--new code from allowableDiscounts.discountPrice
FROM
productInfo AS p
WHERE
--filters applied as needed
You may work it out with a CROSS JOIN
, then compute the smallest difference and filter out the other generated records (with higher differences).您可以使用
CROSS JOIN
来解决它,然后计算最小的差异并过滤掉其他生成的记录(具有更高的差异)。
Smallest difference here is retrieved by assigning a rank to all differences in each partition <Item, OrigPrice> (with ROW_NUMBER
), then all values ranked higher than 1 are discarded.通过为每个分区 <Item, OrigPrice> (使用
ROW_NUMBER
)中的所有差异分配一个等级来检索此处的最小差异,然后丢弃所有排名高于 1 的值。
WITH cte AS (
SELECT *,
OrigPrice*0.5 AS exact50off,
ROW_NUMBER() OVER(PARTITION BY Item, OrigPrice ORDER BY ABS(discountPrice - OrigPrice*0.5)) AS rn
FROM productInfo
CROSS JOIN allowableDiscounts
)
SELECT Item,
OrigPrice,
exact50off,
discountPrice
FROM cte
WHERE rn = 1
Use the ABS(X) function to compute the absolute values between the columns in the tables to make a match as an exact match or a difference in values between 1 and 4 for the various discount values as below, use a LEFT JOIN to get allow values in your leading table productInfo and either matching values or NULL from the allowableDiscounts table.使用 ABS(X) function 来计算表中各列之间的绝对值,以进行匹配作为完全匹配或 1 和 4 之间的差值对于各种折扣值,如下所示,使用 LEFT JOIN 获得允许前导表 productInfo 中的值以及 allowableDiscounts 表中的匹配值或 NULL。
SELECT
p.Item,
p.OrigPrice,
p.OrigPrice * 0.5 AS exact50off,
p.OrigPrice * 0.25 AS exact25off,
p.OrigPrice * 0.75 AS exact75off,
q.discountPrice AS closestMatch
FROM
productInfo AS p
JOIN allowableDiscounts q on ABS(p.OrigPrice * 0.50 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.50 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) BETWEEN 0.01 AND 4.0;
In case the tables are large, as you stated, a cross join is not possible and a window function is the only solution.如果表很大,如您所述,则无法进行交叉连接,而 window function 是唯一的解决方案。
First we generate a function nearest
, which return the element (x or y) closest to a target value.首先我们生成一个 function
nearest
,它返回最接近目标值的元素(x 或 y)。
Then we define both tables, discountPrice and productInfo.然后我们定义两个表,discountPrice 和 productInfo。 Next, we union these tables as
helper
.接下来,我们将这些表合并为
helper
。 The first column
tmp
holds the value 1
, if the data is from the main table productInfo
and we calculate the column exact50off
.如果数据来自主表
productInfo
并且我们计算列exact50off
,则第一column
tmp
的值为1
。 For the table discountPrice the tmp
column in set to 0
and the exact50off
column is filled with the entries discountPrice
.对于表 discountPrice,
tmp
列设置为0
, exact50off
列填充条目discountPrice
。 We add the table discountPrice again, but for column exact75off.我们再次添加表 discountPrice,但针对列 exact75off。
We query the helper
table and use:我们查询
helper
表并使用:
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
tmp=0
: Keep only entries from the table discountPrice tmp=0
:仅保留表 discountPrice 中的条目last_value
get nearest lowest value from table discountPrice last_value
从表 discountPrice 中获取最接近的最低值We run the same again, but with desc
to obtain the nearest highest value.我们再次运行相同的程序,但使用
desc
来获得最接近的最高值。
The function nearest
yields the nearest values of both. nearest
的 function 会产生两者最接近的值。
Analog this is done for exact75off
模拟这是为
exact75off
完成的
create temp function nearest(target any type,x any type, y any type) as (if(abs(target-x)>abs(target-y),y,x) );
with allowableDiscounts as (select * from unnest([51,48.5,40,23,20]) as discountPrice ),
productInfo as (select "Apple" as item, 100 as OrigPrice union all select "Banana",98 union all select "Banana cheap",88),
helper as (
select 1 as tmp, # this column holds the info from which table the rows come forme
item,OrigPrice, # all colummns of the table productInfo (2)
OrigPrice/2 as exact50off, # calc 50%
OrigPrice*0.25 as exact75off, # calc 75%
from productInfo
union all # table for 50%
select 0 as tmp,
null,null, # (2) null entries, because the table productInfo has two columns (2)
discountPrice as exact50off, #possible values for 50% off
null # other calc (75%)
from allowableDiscounts
union all # table for 75%
select 0 as tmp,
null,null, # (2) null entries, because the table productInfo has two columns (2)
null, # other calc (50%)
discountPrice, #possible values for 75% off
from allowableDiscounts
)
select *,
nearest(exact50off,
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off desc)
) as closestMatch50off,
nearest(exact75off,
last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off),
last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off desc)
) as closestMatch75off,
from helper
qualify tmp=1
order by exact50off
Yet another approach另一种方法
create temp function vlookup(data array<float64>, key float64)
returns string language js as r'''
closestMatch = null;
closestDifference = Number.MAX_VALUE;
for (let i = 0; i < data.length; i++) {
difference = Math.abs(data[i] - key);
if (difference < closestDifference) {
closestMatch = data[i];
closestDifference = difference;
}
}
return closestMatch;
''';
with priceOffList as (
select *
from unnest([25, 50, 75]) off
)
select * from (
select Item, OrigPrice, off, offPrice, vlookup(arr, offPrice) as closestMatch
from productInfo,(select array_agg(discountPrice order by discountPrice) arr from allowableDiscounts), priceOffList
,unnest([OrigPrice * off / 100]) as offPrice
)
pivot (any_value(offPrice) offPrice, any_value(closestMatch) closestMatch for off in (25, 50, 75))
if applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.