简体   繁体   中英

BIGQUERY - UNPARTITIONED partition in an integer range partitioned table

I tried to write data into my integer range partitioned table and I used this article like my reference. In fact, my table has two columns: customer_id (INT) and product_name (STRING), so I used customer_id as a field to partition. On the other hand, the article I've read says:

For streaming, data in the streaming buffer is in the UNPARTITIONED partition. When the data is extracted, it initially stays in the UNPARTITIONED partition. When there is enough unpartitioned data, it will be repartitioned into the specific partitions.

The sentence When there is enough unpartitioned data really confuses me because I don't know how many data rows is " enough " in this context, like 5000 rows or 10000 rows?. Do we have any chance to know that please?

In my demo, at the beginning, my table was empty so I streamed 4000 rows ( same customer_id for these 4000 rows) into my table, and I waited until my data is out of streaming buffer (actually I waited until the Streaming buffer statistics disappear in the Details of my table), then, I used this query to know how many partition I have:

#legacySQL
select table_id, partition_id
from [mydataset.customer_product$__PARTITIONS_SUMMARY__] 

And the result is:

|---------------------|------------------|
|      table_id       |   partition_id   |
|---------------------|------------------|
|  customer_product   |__UNPARTITIONED__ |
|---------------------|------------------|

So what is the problem, please?
Moreover, if I overwrite my table, via this query below and some settings of Query settings,

-- change some settings in Query settings to overwrite the table
select *
from mydataset.customer_product

and I'll get: (19265786 is the customer_id for 4000 data rows)

|---------------------|------------------|
|      table_id       |   partition_id   |
|---------------------|------------------|
|  customer_product   |     19265786     |
|---------------------|------------------|

So I have one partition which is good but I have no idea why, please? Could you guys please explain to me this problem?

I'm adding some info, hoping that helps you to address your concerns:

1. Like 5000 rows or 10000 rows? Do we have any chance to know that please?

Based on Checking for data availability , the buffer is time-based rather than size-based, and the data can take up to 90 minutes to become available. In addition, the UNPARTITIONED partition will contain all the data associated to the streaming buffer; so, querying this partition can be the way to know how many rows are in the buffer.

2. ... and I waited until my data is out of streaming buffer (actually I waited until the Streaming buffer statistics disappear in the Details of my table) So what is the problem, please?

There could be a synchronization matter, though I wonder how you determined the streaming was empty. Doc says that it is needed to "check the tables.get response for a section named streamingBuffer". Additionally, the streamingBuffer.oldestEntryTime field can be used to identify the age of rows in the streaming buffer. It is correct that the UNPARTITIONED partition exists if the service has not extracted the data to the final partition.

3. one partition which is good but I have no idea why, please? Could you guys please explain to me this problem?

I agree with Hua Zhang, while you were streaming, the data was buffered in the UNPARTITIONED partition, that's why you only had that one (after some time, up to 90 mins, the data will be delivered to the proper partition). However, when you loaded data directly to the table, the rows were instantly sent to the proper partition (19265786).

You might be interested in the article Life of a BigQuery streaming insert to be read more about streaming and partitioned tables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM