简体   繁体   中英

BigQuery fitment for large read operations

This is understand the right fitment of BigQuery for large read operations. We have a transactional system outside GCP platform which will make ~100 Bigquery API request to read data from Bigquery views/tables and typically it return only one record.

Can you please provide a guidance on reading data from Bigquery is a right fit for such usecase when we consider cost and performance? Do we need to transfer the data to Cloud SQL/Bigtable instead of directly reading from BQ? or any other suggestions..

Although comparing Google BigQuery (a data warehouse which through its internal query engine allows one to run SQL queries on data stored on its tables) to Google Cloud SQL (which is a cloud database service) feels off as

  1. Their intended purposes are different: Cloud SQL is an alternative to setting up your own infrastructure for any database needs whereas while you can store data in tables in BigQuery that are organized into datasets, this is intended simply as a way to easily access data that is expected to frequently be used in conjunction with other data sources and across multiple services or platforms and not as a replacement for your relational database in most cases.
  2. Usage of BigQuery and CloudSQL is not mutually exclusive: As you hinted you can transfer data to CloudSQL and through federated queries use that data within BigQuery

I will try to answer according to what I understand your situation to be - please let me know if any of my presumptions are off and I'd be happy to update my answer.

In terms of just using one or the other, compared to if you were to use a relational databases such as a MySQL or PostgreSQL database hosted on Cloud SQL, while BigQuery is probably the right fit if you are trying to calculate complex analytical queries with your transactional data, running hundreds of CRUD jobs to filter through transactional data is likely to be slower (as articles like this show) and inefficient in terms of cost (assuming that the queries you are writing are parsing through entire tables without substantive filtering or any partitions and therefore a single job is calling the entire table).

With this in mind, if you expect a large amount of transactional data that is constantly updated and only need to access it through very simple queries it may make sense to use a relational database to not only store the data but also as a base for any CRUD jobs that you will run to access the stored data. However, if you are only expecting data in the range of MBs, Cloud SQL costs can easily become more expensive than BigQuery. For example, while running 600-700 queries calling several MBs worth of data came in at a little less than 1 USD over the course of a month for a single project, even a Cloud SQL server with cheaper options for its CPU core count and RAM and no load other than simple queries could run into the range of 60-100 USD per month(this of course changing relative to how it is set/being used).

If that range sounds like it is more expensive than your current BigQuery project and setup, perhaps at this scale it is fine for you to keep storing your data in BQ datasets and tables, and better to reconsider Cloud SQL when you need to leverage the unique characteristics of a relational database which is another topic.

As another caveat, the above answer was written under the assumption that you are using BigQuery and considering Cloud SQL for data storage and querying internal to your organization or company - if you're requesting this data for some sort of public-facing or web-hosted platform or service I would strongly recommend a relational database in light of the delay that querying and serving data from BigQuery versus reading from a Cloud SQL hosted relational database.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM