[英]How do I select a subset of records in BigQuery SQL?
I have a set of records in BigQuery with a variable (CPIRating) that I would like to use to select a subset from.我在 BigQuery 中有一组记录,其中包含一个变量 (CPIRating),我想使用该变量从中选择一个子集。
CPIRating is an integer with a range from 0.1 to 250. I have over 10,000 records. CPIRating 是一个整数,范围从 0.1 到 250。我有超过 10,000 条记录。 What I am trying to create is a single subset/dataset of all the records where
我要创建的是所有记录的单个子集/数据集,其中
As example, if the dataset has 1000 records with a CPIrating of 3.0 or greater, the query finds those, but also adds a further 4000 records (4x) that are below 3.0, but the 4000 records starts with the lowest CPIRating value (closest to 0.0) and adds those until it reaches the 4000.例如,如果数据集有 1000 条 CPIrating 为 3.0 或更高的记录,查询会找到这些记录,但还会添加另外 4000 条低于 3.0 的记录 (4x),但这 4000 条记录从最低的 CPIRating 值开始(最接近0.0) 并添加这些直到达到 4000。
Any ideas on how to structure that query in BigQuery?关于如何在 BigQuery 中构造该查询的任何想法?
First we generate some dummy data in table demo_tbl
.首先我们在表
demo_tbl
中生成一些虚拟数据。 Since CPIRating is normal distributed in this example, we choose values between zero and 3.2 as a maximum.由于 CPIRating 在这个例子中是正态分布的,我们选择 0 到 3.2 之间的值作为最大值。
In the table help
we calculate the rows, which have a CPIRating
of 3 or higher.在表格
help
中,我们计算了CPIRating
为 3 或更高的行。 from demo_tbl,help
joins both tables together and we obtain an additional column CPIRating_count
. from demo_tbl,help
将两个表连接在一起,我们获得了一个额外的列CPIRating_count
。 We numerate the rows by ascending CPIRating
and create a row_number.我们通过升序
CPIRating
对行进行编号并创建一个 row_number。 Since this is a window function with over
no where
but a qualify
clause is needed to filter the rows.因为这是一个没有
where
over
窗口函数,所以需要一个qualify
子句来过滤行。 In this filter the CPIRating<3.0
is not needed, but I find it easier to read.在此过滤器中,
CPIRating<3.0
,但我发现它更易于阅读。
With demo_tbl as (Select *, rand() *3.2 as CPIRating from unnest(generate_array(0,1*100)) id),
help as (select count(1) as CPIRating_count from demo_tbl where CPIRating>=3.0)
Select *,
row_number() over (order by CPIRating) as row_id
from demo_tbl,help
qualify (row_id < 4*help.CPIRating_count and CPIRating<3.0) or CPIRating>=3.0
order by row_id desc
The column CPIRating_count
can also be generated by a window function instead of an join. CPIRating_count
列也可以由窗口函数而不是连接生成。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.