简体   繁体   English

创建表时不保留 BigQuery 的分区

[英]Partitioning in view of BigQuery is not remaining when create table

I'm trying to run我正在尝试运行

SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t

in BigQuery which shows the approprite result.在 BigQuery 中显示适当的结果。 But the problem is when I want to create a table with the same order it's become a mess and order would not considered.但问题是当我想创建一个具有相同顺序的表时,它会变得一团糟,并且不会考虑顺序。

CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t

IN Addition I tried:另外我试过:

CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ORDER BY t.Barcode, t.Country_Code) AS seqnum_c
FROM t

And got the same result.并得到了同样的结果。 Have you ever faced the same issue?你有没有遇到过同样的问题?

You need to specify how you want the rows within the partition to be ordered in order for it to be deterministic.您需要指定如何对分区中的行进行排序,以使其具有确定性。

It looks like you attempted to do this in your second example, but you did ORDER BY t.Barcode, t.Country_Code which are exactly your partition columns.看起来您在第二个示例中尝试执行此操作,但您执行了ORDER BY t.Barcode, t.Country_Code ,这正是您的分区列。 That means that within each partition, each row will already have exactly the same barcode and country_code so effectively, there is no ordering happening.这意味着在每个分区中,每一行都将具有完全相同的barcodecountry_code ,因此不会发生排序。

For example, given the following rows例如,给定以下行

Barcode  Country_Code  Timestamp
111      USA           12345
111      USA           12346
111      JP            12350

You are partitioning by Barcode and Country_code so the first two rows will be a part of the same partition.您正在按BarcodeCountry_code进行分区,因此前两行将属于同一分区。 However, since you don't specify an order, you cannot know which row will get which row number.但是,由于您没有指定顺序,因此您无法知道哪一行将获得哪一行号。 In the example above, it would make sense to ORDER BY Timestamp , but without knowing your data or your goals it's hard to say what the right logic is for you.在上面的示例中, ORDER BY Timestamp是有意义的,但是如果不知道您的数据或您的目标,就很难说出适合您的逻辑是什么。

In short, you need to specify an ORDER BY column that is not a part of the PARTITION BY columns in order to deterministically order the rows within each partition.简而言之,您需要指定一个不属于PARTITION BY列的ORDER BY列,以便确定地对每个分区中的行进行排序。

Thanks @ken for your response.感谢@ken 的回复。 I guess I found my answer which is:我想我找到了答案:

CREATE OR REPLACE TABLE t 
 AS (
SELECT t.*,
ROW_NUMBER() over (partition by t.Barcode, t.Country_Code order by Barcode, Country_Code  ) as seqnum_c
FROM  t)
ORDER BY Barcode,Country_Code,seqnum_c);

Best最好的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM