简体   繁体   中英

Extracting BigQuery data to R for preprocessing and analysis

I have a big dataset in bigquery and writing SQL queries in bigquery. It produces fast results. Although I want to use R/python for data preprocessing. I have approx. 200M records in my table and R is very slow.

So considering the amount of data shall I use bigquery query or there is another way of working with R/python which is also fast. Or google offers some product which can be used to create data summary avoiding SQL queries.

BigQuery is generally the best solution for fast processing of large amounts of data. If you want to avoid SQL queries though, you might want to consider preprocessing your data via a Dataflow pipeline or using Dataprep (beware though, the latter is in beta).

As mentioned Lefteris before, BigQuery might be the solution that scales best.

If you still want integration with R, have you looked at bigrquery ?

https://github.com/r-dbi/bigrquery

https://cloud.google.com/blog/big-data/2017/04/google-cloud-platform-for-data-scientists-using-r-with-google-bigquery

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM