简体   繁体   English

在R中组合多个MySQL表的最佳方法

[英]Best approach to combine multiple MySQL tables in R

What is the best approach to combine multiple MySQL tables in R? 在R中组合多个MySQL表的最佳方法是什么? For instance, I need to rbind 14 large `MySQL tables (each >100k rows by 100 columns). 例如,我需要重新rbind 14个大型的MySQL表(每个> 100k行乘100列)。 I tried the below approach, which consumed most of my memory and got time out from MySQL. 我尝试了以下方法,该方法消耗了我的大部分内存,并且使MySQL超时。 I am wondering if there is alternative solution? 我想知道是否有替代解决方案? I do not need to fetch the whole table, just need group the whole table by a couple of variables and calculate some metrics. 我不需要获取整个表,只需要通过几个变量对整个表进行分组并计算一些指标。

station_tbl_t <- dbSendQuery(my_db, "select * from tbl_r3_300ft
                  union all
                  select * from tbl_r4_350ft
                  union all
                  select * from tbl_r5_400ft
                  union all
                  select * from tbl_r6_500ft
                  union all
                  select * from tbl_r7_600ft
                  union all
                  select * from tbl_r8_700ft
                  union all
                  select * from tbl_r9_800ft
                  union all
                  select * from tbl_r10_900ft
                  union all
                  select * from tbl_r11_1000ft
                  union all
                  select * from tbl_r12_1200ft
                  union all
                  select * from tbl_r13_1400ft
                  union all
                  select * from tbl_r14_1600ft
                  union all
                  select * from tbl_r15_1800ft
                  union all
                  select * from tbl_r16_2000ft
                  ")

Consider iteratively importing MySQL table data and then row bind with R. And be sure to select needed columns to save on overhead: 考虑迭代导入MySQL表数据,然后使用R进行行绑定。并确保选择所需的列以节省开销:

tbls <- c("tbl_r3_300ft", "tbl_r4_350ft", "tbl_r5_400ft", 
          "tbl_r6_500ft", "tbl_r7_600ft", "tbl_r8_700ft", 
          "tbl_r9_800ft", "tbl_r10_900ft", "tbl_r11_1000ft", 
          "tbl_r12_1200ft", "tbl_r13_1400ft", "tbl_r14_1600ft", 
          "tbl_r15_1800ft", "tbl_r16_2000ft")

sql <- "SELECT Col1, Col2, Col3 FROM" 

dfList <- lapply(paste(sql, tbls), function(s) {
             tryCatch({ return(dbGetQuery(my_db, s)) 
                      }, error = function(e) return(as.character(e)))
          })

# ROW BIND VERSIONS ACROSS PACKAGES
master_df <- base::do.call(rbind, dfList)
master_df <- plyr::rbind.fill(dfList)
master_df <- dplyr::bind_rows(dfList)
master_df <- data.table::rbindlist(dfList)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM