简体   繁体   English

Google BigQuery SQL:独立订购两列

[英]Google BigQuery SQL: Order two columns independently

Say I have some data like: 假设我有一些数据,例如:

grp   v1   v2
---   --   --
 2    5    7
 2    4    9
 3    10   2
 3    11   1

I'd like to create new columns which are independent of the ordering of the table - such that the two columns have independent orderings, ie sort by v1 independently of v2, while partitioning by grp. 我想创建独立于表顺序的新列-这样两列具有独立的顺序,即按v1而不是v2进行排序,同时按grp进行分区。

The result (independently ordered, partitioned by grp) would be: 结果(独立排序,由grp划分)为:

grp   v1   v2  v1_ordered v2_ordered
---   --   --  ---------- ----------
 2    5    7       4          7
 2    4    9       5          9
 3    10   2      10          1
 3    11   1      11          2

One way to do this is to create two tables and CROSS JOIN. 一种方法是创建两个表和CROSS JOIN。 However, I'm working with too many rows of data for this to be computationally tractable - is there a way to do this within a single query without a JOIN? 但是,我正在处理太多的数据行,以使其在计算上难以处理-是否可以在没有JOIN的单个查询中完成此操作?

Basically, I'd like to write SQL like: 基本上,我想这样写SQL:

SELECT
  *,
  v1 OVER (PARTITION BY grp ORDER BY v1 ASC) as v1_ordered,
  v2 OVER (PARTITION BY grp ORDER BY v2 ASC) as v2_ordered
FROM [example_table]

This breaks table row meaning, but it's a necessary feature for many applications - for example computing ordered correlation between two fields CORR(v1_ordered, v2_ordered). 这打破了表行的含义,但这是许多应用程序所必需的功能-例如,计算两个字段CORR(v1_ordered, v2_ordered).之间的有序相关性CORR(v1_ordered, v2_ordered).

Is this possible? 这可能吗?

I think you are in right direction! 我认为您的方向正确! You just need to use proper window function . 您只需要使用适当的窗口功能。 Row_number() in this case. 在这种情况下为Row_number()。 And it should work! 它应该工作!

Adding working example as per @cgn request: 根据@cgn请求添加工作示例:
I dont think there is way to totally avoid use of JOIN. 我认为没有办法完全避免使用JOIN。
At the same time below example uses just ONE JOIN vs TWO JOIN s in other answers: 同时在以下示例中,在其他答案中仅使用ONE JOINTWO JOIN

SELECT 
  a.grp AS grp, 
  a.v1 AS v1, 
  a.v2 AS v2, 
  a.v1 AS v1_ordered, 
  b.v2 AS v2_ordered 
FROM (
  SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
  FROM [example_table]
) AS a
JOIN (
  SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
  FROM [example_table]
) AS b
ON a.grp = b.grp AND a.v1_order = b.v2_order 

Result is as expected: 结果符合预期:

grp v1  v2  v1_ordered  v2_ordered   
2    4   9           4           7   
2    5   7           5           9   
3   10   2          10           1   
3   11   1          11           2   

And now you can use CORR() as below 现在您可以如下使用CORR()

SELECT grp, CORR(v1_ordered, v2_ordered) AS [corr]
FROM (
  SELECT 
    a.grp AS grp, 
    a.v1 AS v1, 
    a.v2 AS v2, 
    a.v1 AS v1_ordered, 
    b.v2 AS v2_ordered 
  FROM (
    SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
    FROM [example_table]
  ) AS a
  JOIN (
    SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
    FROM [example_table]
  ) AS b
  ON a.grp = b.grp AND a.v1_order = b.v2_order
)
GROUP BY grp

This will work for you. 这将为您工作。

SQLFiddle Demo in SQL Server

Note: The sequence you mentioned in the sample, is not necessary how the rows are returned from database. 注意:示例中提到的顺序对于从数据库返回行是不必要的。 In my case, for v1 , I got 4,5,10,11 unlike your 5,4,10,11 . 就我而言,对于v1 ,我得到4,5,10,11与您的5,4,10,11不同。 However, your output will be same as you wanted. 但是,您的输出将与您想要的相同。

Select t.grp,t.v1,t.v2,
v1.v1 as v1_ordered,v2.v2 as v2_ordered
From
(
    select t1.*,
    row_number() over (partition by grp
                   Order by v1) v1o
    ,
    row_number() over (partition by grp
                   Order by v2) v2o
    from table1 t1
) t
Inner join
(
    Select t.*,
    row_number() over (partition by grp
                   Order by v1) v1o
    From table1 t
) v1
On t.grp=v1.grp
And t.v1o=v1.v1o
Inner join
(
    Select t.*,
    row_number() over (partition by grp
                   Order by v2) v2o
    From table1 t
) v2
On t.grp=v2.grp
And t.v1o=v2.v2o

Output: 输出:

+------+-----+-----+-------------+------------+
| grp  | v1  | v2  | v1_ordered  | v2_ordered |
+------+-----+-----+-------------+------------+
|   2  |  4  |  9  |          4  |          7 |
|   2  |  5  |  7  |          5  |          9 |
|   3  | 10  |  2  |         10  |          1 |
|   3  | 11  |  1  |         11  |          2 |
+------+-----+-----+-------------+------------+

AI'm not 100% sure this works in BigQuery, but here is goes: AI并非100%确信这可以在BigQuery中使用,但是可以这样:

select e.*, ev1.v1, ev2.v2
from (select e.*,
             row_number() over (partition by grp order by v1) as seqnum_v1,
             row_number() over (partition by grp order by v2) as seqnum_v2
      from example e
     ) e join
     (select e.*, row_number() over (partition by grp order by v1) as seqnum_v1
      from example e
     ) ev1
     on ev1.grp = e.grp and ev1.seqnum_v1 = e.seqnum_v1 join
     (select e.*, row_number() over (partition by grp order by v2) as seqnum_v2
      from example e
     ) ev2
     on ev2.grp = e.grp and ev2.seqnum_v2 = e.seqnum_v2;

The idea is to assign an independent ordering to each of the columns. 这个想法是给每个列分配一个独立的顺序。 Then join back to the original table to get the actual value. 然后联接回到原始表以获取实际值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM