Google BigQuery SQL：独立订购两列

Question

Say I have some data like: 假设我有一些数据，例如：

grp   v1   v2
---   --   --
 2    5    7
 2    4    9
 3    10   2
 3    11   1

I'd like to create new columns which are independent of the ordering of the table - such that the two columns have independent orderings, ie sort by v1 independently of v2, while partitioning by grp. 我想创建独立于表顺序的新列-这样两列具有独立的顺序，即按v1而不是v2进行排序，同时按grp进行分区。

The result (independently ordered, partitioned by grp) would be: 结果（独立排序，由grp划分）为：

grp   v1   v2  v1_ordered v2_ordered
---   --   --  ---------- ----------
 2    5    7       4          7
 2    4    9       5          9
 3    10   2      10          1
 3    11   1      11          2

One way to do this is to create two tables and CROSS JOIN. 一种方法是创建两个表和CROSS JOIN。 However, I'm working with too many rows of data for this to be computationally tractable - is there a way to do this within a single query without a JOIN? 但是，我正在处理太多的数据行，以使其在计算上难以处理-是否可以在没有JOIN的单个查询中完成此操作？

Basically, I'd like to write SQL like: 基本上，我想这样写SQL：

SELECT
  *,
  v1 OVER (PARTITION BY grp ORDER BY v1 ASC) as v1_ordered,
  v2 OVER (PARTITION BY grp ORDER BY v2 ASC) as v2_ordered
FROM [example_table]

This breaks table row meaning, but it's a necessary feature for many applications - for example computing ordered correlation between two fields CORR(v1_ordered, v2_ordered). 这打破了表行的含义，但这是许多应用程序所必需的功能-例如，计算两个字段CORR(v1_ordered, v2_ordered).之间的有序相关性CORR(v1_ordered, v2_ordered).

Is this possible? 这可能吗？

Answer 1

I think you are in right direction! 我认为您的方向正确！ You just need to use proper window function . 您只需要使用适当的窗口功能。 Row_number() in this case. 在这种情况下为Row_number（）。 And it should work! 它应该工作！

Adding working example as per @cgn request: 根据@cgn请求添加工作示例：
I dont think there is way to totally avoid use of JOIN. 我认为没有办法完全避免使用JOIN。
At the same time below example uses just ONE JOIN vs TWO JOIN s in other answers: 同时在以下示例中，在其他答案中仅使用ONE JOIN和TWO JOIN ：

SELECT 
  a.grp AS grp, 
  a.v1 AS v1, 
  a.v2 AS v2, 
  a.v1 AS v1_ordered, 
  b.v2 AS v2_ordered 
FROM (
  SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
  FROM [example_table]
) AS a
JOIN (
  SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
  FROM [example_table]
) AS b
ON a.grp = b.grp AND a.v1_order = b.v2_order

Result is as expected: 结果符合预期：

grp v1  v2  v1_ordered  v2_ordered   
2    4   9           4           7   
2    5   7           5           9   
3   10   2          10           1   
3   11   1          11           2

And now you can use CORR() as below 现在您可以如下使用CORR（）

SELECT grp, CORR(v1_ordered, v2_ordered) AS [corr]
FROM (
  SELECT 
    a.grp AS grp, 
    a.v1 AS v1, 
    a.v2 AS v2, 
    a.v1 AS v1_ordered, 
    b.v2 AS v2_ordered 
  FROM (
    SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
    FROM [example_table]
  ) AS a
  JOIN (
    SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
    FROM [example_table]
  ) AS b
  ON a.grp = b.grp AND a.v1_order = b.v2_order
)
GROUP BY grp

Answer 2

This will work for you. 这将为您工作。

SQLFiddle Demo in SQL Server

Note: The sequence you mentioned in the sample, is not necessary how the rows are returned from database. 注意：示例中提到的顺序对于从数据库返回行是不必要的。 In my case, for v1 , I got 4,5,10,11 unlike your 5,4,10,11 . 就我而言，对于v1 ，我得到4,5,10,11与您的5,4,10,11不同。 However, your output will be same as you wanted. 但是，您的输出将与您想要的相同。

Select t.grp,t.v1,t.v2,
v1.v1 as v1_ordered,v2.v2 as v2_ordered
From
(
    select t1.*,
    row_number() over (partition by grp
                   Order by v1) v1o
    ,
    row_number() over (partition by grp
                   Order by v2) v2o
    from table1 t1
) t
Inner join
(
    Select t.*,
    row_number() over (partition by grp
                   Order by v1) v1o
    From table1 t
) v1
On t.grp=v1.grp
And t.v1o=v1.v1o
Inner join
(
    Select t.*,
    row_number() over (partition by grp
                   Order by v2) v2o
    From table1 t
) v2
On t.grp=v2.grp
And t.v1o=v2.v2o

Output: 输出：

+------+-----+-----+-------------+------------+
| grp  | v1  | v2  | v1_ordered  | v2_ordered |
+------+-----+-----+-------------+------------+
|   2  |  4  |  9  |          4  |          7 |
|   2  |  5  |  7  |          5  |          9 |
|   3  | 10  |  2  |         10  |          1 |
|   3  | 11  |  1  |         11  |          2 |
+------+-----+-----+-------------+------------+

Answer 3

AI'm not 100% sure this works in BigQuery, but here is goes: AI并非100％确信这可以在BigQuery中使用，但是可以这样：

select e.*, ev1.v1, ev2.v2
from (select e.*,
             row_number() over (partition by grp order by v1) as seqnum_v1,
             row_number() over (partition by grp order by v2) as seqnum_v2
      from example e
     ) e join
     (select e.*, row_number() over (partition by grp order by v1) as seqnum_v1
      from example e
     ) ev1
     on ev1.grp = e.grp and ev1.seqnum_v1 = e.seqnum_v1 join
     (select e.*, row_number() over (partition by grp order by v2) as seqnum_v2
      from example e
     ) ev2
     on ev2.grp = e.grp and ev2.seqnum_v2 = e.seqnum_v2;

The idea is to assign an independent ordering to each of the columns. 这个想法是给每个列分配一个独立的顺序。 Then join back to the original table to get the actual value. 然后联接回到原始表以获取实际值。

Google BigQuery SQL：独立订购两列

问题描述

3 个解决方案

解决方案1
1 已采纳 2016-01-23 21:35:18

解决方案2
1 2016-01-24 03:18:13

解决方案3
0 2016-01-23 22:44:32

Google BigQuery SQL：独立订购两列

问题描述

3 个解决方案

解决方案1 1 已采纳 2016-01-23 21:35:18

解决方案2 1 2016-01-24 03:18:13

解决方案3 0 2016-01-23 22:44:32

解决方案1
1 已采纳 2016-01-23 21:35:18

解决方案2
1 2016-01-24 03:18:13

解决方案3
0 2016-01-23 22:44:32