简体   繁体   中英

Google BigQuery SQL: Order two columns independently

Say I have some data like:

grp   v1   v2
---   --   --
 2    5    7
 2    4    9
 3    10   2
 3    11   1

I'd like to create new columns which are independent of the ordering of the table - such that the two columns have independent orderings, ie sort by v1 independently of v2, while partitioning by grp.

The result (independently ordered, partitioned by grp) would be:

grp   v1   v2  v1_ordered v2_ordered
---   --   --  ---------- ----------
 2    5    7       4          7
 2    4    9       5          9
 3    10   2      10          1
 3    11   1      11          2

One way to do this is to create two tables and CROSS JOIN. However, I'm working with too many rows of data for this to be computationally tractable - is there a way to do this within a single query without a JOIN?

Basically, I'd like to write SQL like:

SELECT
  *,
  v1 OVER (PARTITION BY grp ORDER BY v1 ASC) as v1_ordered,
  v2 OVER (PARTITION BY grp ORDER BY v2 ASC) as v2_ordered
FROM [example_table]

This breaks table row meaning, but it's a necessary feature for many applications - for example computing ordered correlation between two fields CORR(v1_ordered, v2_ordered).

Is this possible?

I think you are in right direction! You just need to use proper window function . Row_number() in this case. And it should work!

Adding working example as per @cgn request:
I dont think there is way to totally avoid use of JOIN.
At the same time below example uses just ONE JOIN vs TWO JOIN s in other answers:

SELECT 
  a.grp AS grp, 
  a.v1 AS v1, 
  a.v2 AS v2, 
  a.v1 AS v1_ordered, 
  b.v2 AS v2_ordered 
FROM (
  SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
  FROM [example_table]
) AS a
JOIN (
  SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
  FROM [example_table]
) AS b
ON a.grp = b.grp AND a.v1_order = b.v2_order 

Result is as expected:

grp v1  v2  v1_ordered  v2_ordered   
2    4   9           4           7   
2    5   7           5           9   
3   10   2          10           1   
3   11   1          11           2   

And now you can use CORR() as below

SELECT grp, CORR(v1_ordered, v2_ordered) AS [corr]
FROM (
  SELECT 
    a.grp AS grp, 
    a.v1 AS v1, 
    a.v2 AS v2, 
    a.v1 AS v1_ordered, 
    b.v2 AS v2_ordered 
  FROM (
    SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
    FROM [example_table]
  ) AS a
  JOIN (
    SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
    FROM [example_table]
  ) AS b
  ON a.grp = b.grp AND a.v1_order = b.v2_order
)
GROUP BY grp

This will work for you.

SQLFiddle Demo in SQL Server

Note: The sequence you mentioned in the sample, is not necessary how the rows are returned from database. In my case, for v1 , I got 4,5,10,11 unlike your 5,4,10,11 . However, your output will be same as you wanted.

Select t.grp,t.v1,t.v2,
v1.v1 as v1_ordered,v2.v2 as v2_ordered
From
(
    select t1.*,
    row_number() over (partition by grp
                   Order by v1) v1o
    ,
    row_number() over (partition by grp
                   Order by v2) v2o
    from table1 t1
) t
Inner join
(
    Select t.*,
    row_number() over (partition by grp
                   Order by v1) v1o
    From table1 t
) v1
On t.grp=v1.grp
And t.v1o=v1.v1o
Inner join
(
    Select t.*,
    row_number() over (partition by grp
                   Order by v2) v2o
    From table1 t
) v2
On t.grp=v2.grp
And t.v1o=v2.v2o

Output:

+------+-----+-----+-------------+------------+
| grp  | v1  | v2  | v1_ordered  | v2_ordered |
+------+-----+-----+-------------+------------+
|   2  |  4  |  9  |          4  |          7 |
|   2  |  5  |  7  |          5  |          9 |
|   3  | 10  |  2  |         10  |          1 |
|   3  | 11  |  1  |         11  |          2 |
+------+-----+-----+-------------+------------+

AI'm not 100% sure this works in BigQuery, but here is goes:

select e.*, ev1.v1, ev2.v2
from (select e.*,
             row_number() over (partition by grp order by v1) as seqnum_v1,
             row_number() over (partition by grp order by v2) as seqnum_v2
      from example e
     ) e join
     (select e.*, row_number() over (partition by grp order by v1) as seqnum_v1
      from example e
     ) ev1
     on ev1.grp = e.grp and ev1.seqnum_v1 = e.seqnum_v1 join
     (select e.*, row_number() over (partition by grp order by v2) as seqnum_v2
      from example e
     ) ev2
     on ev2.grp = e.grp and ev2.seqnum_v2 = e.seqnum_v2;

The idea is to assign an independent ordering to each of the columns. Then join back to the original table to get the actual value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM