简体   繁体   中英

How to improve frequent BigQuery reads?

I'm using BigQuery for Java to do small reads on a table with about ~5GB of data. The queries I do follows the most standard SQL like SELECT foo FROM my-table WHERE bar=$1 where the result will be at most 1 row. I need to do this at a high frequency and therefore performance is a big concern. How do I optimize for this?

I thought about pulling the entire data set periodically since it's only 5GB, but then again 5GB sounds like a lot to be constantly keeping in memory.

Running this query in BigQuery console shows something like Query complete (0.6 sec elapsed, 4.2 GB processed) . Fast for 4.2 GB but not fast enough. Again, I need to very frequently read from it but rarely (maybe once a day or week) write to it.

Maybe tell the server to cache the processed data somehow?

You don't have control over the Cache layer in BigQuery. That is something the service does automatically for you. Unfortunately typical cache lifetime is 24 hours, and the cached results are best-effort and may be invalidated sooner (Official docs ).

Query completes in 0.6s seems to be goo for BQ. I'm afraide that If you are looking for something faster maybe BigQuery isn't the data warehouse for your use case.

BigQuery is built for analytical processing and not to interact with individual rows. The best practice would be as you mentioned to hold a copy of it in a place that allows quicker and more efficient reading of individual rows (like a MySQL database).

However, you can still vastly optimize the amount of data scanned in your query by clustering the table on the field that you're filtering on.

https://cloud.google.com/bigquery/docs/creating-clustered-tables

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM