[英]Partitioning in view of BigQuery is not remaining when create table
I'm trying to run我正在尝试运行
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t
in BigQuery which shows the approprite result.在 BigQuery 中显示适当的结果。 But the problem is when I want to create a table with the same order it's become a mess and order would not considered.
但问题是当我想创建一个具有相同顺序的表时,它会变得一团糟,并且不会考虑顺序。
CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t
IN Addition I tried:另外我试过:
CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ORDER BY t.Barcode, t.Country_Code) AS seqnum_c
FROM t
And got the same result.并得到了同样的结果。 Have you ever faced the same issue?
你有没有遇到过同样的问题?
You need to specify how you want the rows within the partition to be ordered in order for it to be deterministic.您需要指定如何对分区中的行进行排序,以使其具有确定性。
It looks like you attempted to do this in your second example, but you did ORDER BY t.Barcode, t.Country_Code
which are exactly your partition columns.看起来您在第二个示例中尝试执行此操作,但您执行了
ORDER BY t.Barcode, t.Country_Code
,这正是您的分区列。 That means that within each partition, each row will already have exactly the same barcode
and country_code
so effectively, there is no ordering happening.这意味着在每个分区中,每一行都将具有完全相同的
barcode
和country_code
,因此不会发生排序。
For example, given the following rows例如,给定以下行
Barcode Country_Code Timestamp
111 USA 12345
111 USA 12346
111 JP 12350
You are partitioning by Barcode
and Country_code
so the first two rows will be a part of the same partition.您正在按
Barcode
和Country_code
进行分区,因此前两行将属于同一分区。 However, since you don't specify an order, you cannot know which row will get which row number.但是,由于您没有指定顺序,因此您无法知道哪一行将获得哪一行号。 In the example above, it would make sense to
ORDER BY Timestamp
, but without knowing your data or your goals it's hard to say what the right logic is for you.在上面的示例中,
ORDER BY Timestamp
是有意义的,但是如果不知道您的数据或您的目标,就很难说出适合您的逻辑是什么。
In short, you need to specify an ORDER BY
column that is not a part of the PARTITION BY
columns in order to deterministically order the rows within each partition.简而言之,您需要指定一个不属于
PARTITION BY
列的ORDER BY
列,以便确定地对每个分区中的行进行排序。
Thanks @ken for your response.感谢@ken 的回复。 I guess I found my answer which is:
我想我找到了答案:
CREATE OR REPLACE TABLE t
AS (
SELECT t.*,
ROW_NUMBER() over (partition by t.Barcode, t.Country_Code order by Barcode, Country_Code ) as seqnum_c
FROM t)
ORDER BY Barcode,Country_Code,seqnum_c);
Best最好的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.