简体   繁体   English

将BigQuery数据提取到R以进行预处理和分析

[英]Extracting BigQuery data to R for preprocessing and analysis

I have a big dataset in bigquery and writing SQL queries in bigquery. 我在bigquery中有一个大数据集,并在bigquery中编写SQL查询。 It produces fast results. 它产生快速的结果。 Although I want to use R/python for data preprocessing. 虽然我想使用R / python进行数据预处理。 I have approx. 我大约。 200M records in my table and R is very slow. 我的表和R中的200M条记录非常慢。

So considering the amount of data shall I use bigquery query or there is another way of working with R/python which is also fast. 因此,考虑到数据量,我应该使用bigquery查询还是还有另一种使用R / python的方法,该方法也很快。 Or google offers some product which can be used to create data summary avoiding SQL queries. 或者Google提供了一些可用于创建数据摘要的产品,从而避免了SQL查询。

BigQuery is generally the best solution for fast processing of large amounts of data. 通常,BigQuery是快速处理大量数据的最佳解决方案。 If you want to avoid SQL queries though, you might want to consider preprocessing your data via a Dataflow pipeline or using Dataprep (beware though, the latter is in beta). 但是,如果要避免SQL查询,则可能要考虑通过Dataflow管道或使用Dataprep预处理数据(不过请注意,后者处于beta中)。

As mentioned Lefteris before, BigQuery might be the solution that scales best. 就像Lefteris之前提到的那样,BigQuery可能是扩展性最好的解决方案。

If you still want integration with R, have you looked at bigrquery ? 如果您仍想与R集成,是否已查看bigrquery

https://github.com/r-dbi/bigrquery https://github.com/r-dbi/bigrquery

https://cloud.google.com/blog/big-data/2017/04/google-cloud-platform-for-data-scientists-using-r-with-google-bigquery https://cloud.google.com/blog/big-data/2017/04/google-cloud-platform-for-data-scientists-using-r-with-google-bigquery

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM