[英]Google BigQuery SQL: Order two columns independently
Say I have some data like: 假设我有一些数据,例如:
grp v1 v2
--- -- --
2 5 7
2 4 9
3 10 2
3 11 1
I'd like to create new columns which are independent of the ordering of the table - such that the two columns have independent orderings, ie sort by v1 independently of v2, while partitioning by grp. 我想创建独立于表顺序的新列-这样两列具有独立的顺序,即按v1而不是v2进行排序,同时按grp进行分区。
The result (independently ordered, partitioned by grp) would be: 结果(独立排序,由grp划分)为:
grp v1 v2 v1_ordered v2_ordered
--- -- -- ---------- ----------
2 5 7 4 7
2 4 9 5 9
3 10 2 10 1
3 11 1 11 2
One way to do this is to create two tables and CROSS JOIN. 一种方法是创建两个表和CROSS JOIN。 However, I'm working with too many rows of data for this to be computationally tractable - is there a way to do this within a single query without a JOIN?
但是,我正在处理太多的数据行,以使其在计算上难以处理-是否可以在没有JOIN的单个查询中完成此操作?
Basically, I'd like to write SQL like: 基本上,我想这样写SQL:
SELECT
*,
v1 OVER (PARTITION BY grp ORDER BY v1 ASC) as v1_ordered,
v2 OVER (PARTITION BY grp ORDER BY v2 ASC) as v2_ordered
FROM [example_table]
This breaks table row meaning, but it's a necessary feature for many applications - for example computing ordered correlation between two fields CORR(v1_ordered, v2_ordered).
这打破了表行的含义,但这是许多应用程序所必需的功能-例如,计算两个字段
CORR(v1_ordered, v2_ordered).
之间的有序相关性CORR(v1_ordered, v2_ordered).
Is this possible? 这可能吗?
I think you are in right direction! 我认为您的方向正确! You just need to use proper window function .
您只需要使用适当的窗口功能。 Row_number() in this case.
在这种情况下为Row_number()。 And it should work!
它应该工作!
Adding working example as per @cgn request: 根据@cgn请求添加工作示例:
I dont think there is way to totally avoid use of JOIN. 我认为没有办法完全避免使用JOIN。
At the same time below example uses just ONE JOIN vs TWO JOIN s in other answers: 同时在以下示例中,在其他答案中仅使用ONE JOIN和TWO JOIN :
SELECT
a.grp AS grp,
a.v1 AS v1,
a.v2 AS v2,
a.v1 AS v1_ordered,
b.v2 AS v2_ordered
FROM (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
FROM [example_table]
) AS a
JOIN (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
FROM [example_table]
) AS b
ON a.grp = b.grp AND a.v1_order = b.v2_order
Result is as expected: 结果符合预期:
grp v1 v2 v1_ordered v2_ordered
2 4 9 4 7
2 5 7 5 9
3 10 2 10 1
3 11 1 11 2
And now you can use CORR() as below 现在您可以如下使用CORR()
SELECT grp, CORR(v1_ordered, v2_ordered) AS [corr]
FROM (
SELECT
a.grp AS grp,
a.v1 AS v1,
a.v2 AS v2,
a.v1 AS v1_ordered,
b.v2 AS v2_ordered
FROM (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
FROM [example_table]
) AS a
JOIN (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
FROM [example_table]
) AS b
ON a.grp = b.grp AND a.v1_order = b.v2_order
)
GROUP BY grp
This will work for you. 这将为您工作。
Note: The sequence you mentioned in the sample, is not necessary how the rows are returned from database. 注意:示例中提到的顺序对于从数据库返回行是不必要的。 In my case, for
v1
, I got 4,5,10,11
unlike your 5,4,10,11
. 就我而言,对于
v1
,我得到4,5,10,11
与您的5,4,10,11
不同。 However, your output will be same as you wanted. 但是,您的输出将与您想要的相同。
Select t.grp,t.v1,t.v2,
v1.v1 as v1_ordered,v2.v2 as v2_ordered
From
(
select t1.*,
row_number() over (partition by grp
Order by v1) v1o
,
row_number() over (partition by grp
Order by v2) v2o
from table1 t1
) t
Inner join
(
Select t.*,
row_number() over (partition by grp
Order by v1) v1o
From table1 t
) v1
On t.grp=v1.grp
And t.v1o=v1.v1o
Inner join
(
Select t.*,
row_number() over (partition by grp
Order by v2) v2o
From table1 t
) v2
On t.grp=v2.grp
And t.v1o=v2.v2o
Output: 输出:
+------+-----+-----+-------------+------------+
| grp | v1 | v2 | v1_ordered | v2_ordered |
+------+-----+-----+-------------+------------+
| 2 | 4 | 9 | 4 | 7 |
| 2 | 5 | 7 | 5 | 9 |
| 3 | 10 | 2 | 10 | 1 |
| 3 | 11 | 1 | 11 | 2 |
+------+-----+-----+-------------+------------+
AI'm not 100% sure this works in BigQuery, but here is goes: AI并非100%确信这可以在BigQuery中使用,但是可以这样:
select e.*, ev1.v1, ev2.v2
from (select e.*,
row_number() over (partition by grp order by v1) as seqnum_v1,
row_number() over (partition by grp order by v2) as seqnum_v2
from example e
) e join
(select e.*, row_number() over (partition by grp order by v1) as seqnum_v1
from example e
) ev1
on ev1.grp = e.grp and ev1.seqnum_v1 = e.seqnum_v1 join
(select e.*, row_number() over (partition by grp order by v2) as seqnum_v2
from example e
) ev2
on ev2.grp = e.grp and ev2.seqnum_v2 = e.seqnum_v2;
The idea is to assign an independent ordering to each of the columns. 这个想法是给每个列分配一个独立的顺序。 Then join back to the original table to get the actual value.
然后联接回到原始表以获取实际值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.