简体   繁体   中英

Google BigQuery: streaming & clustering at the same time

I have a process that streams data to a few different tables(ingestion-time partitioned). I try to replace them by creating clustered equivalents.

Encouraged by an excellent article , I've started to improve queries performance. I've created new tables with corresponding schema, proper clustering fields and set up streaming.

Just to mention, I did some tests before with tables into which data were loaded and queries got the boost. After two days of streaming, I've noticed that there is no gain using the new setup. As I understand from the topic and the other one and the issue , clustering with streaming don't give extra gain without additional effort. Am I right or not? I thought about systematic re-clustering of previous day partition, but still no gain for querying the most recent data.

What would be the best way to make those two features work together to improve queries performance? Is there a way to re-cluster data that do not have a real key to use in DML MERGE statement?

The idea of the clustering partitioned tables is that every time you have already clustered some of your data

You have only to run the

SELECT *

and the data which added after the last clustering will be in order.

After this you will be able to search more efficient into your bigquery data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM