简体   繁体   English

通过绑定数据帧为R进行更快的for循环

[英]make a faster for-loop for R with rbinding dataframes

I am trying to bind dataframes that comes from JSON data 我正在尝试绑定来自JSON数据的数据帧

I tried using rbind.fill and for loop which works for small data, but it takes too long for data more than 100k. 我尝试使用rbind.fill和for循环,该循环适用于小数据,但是对于超过100k的数据花费太长时间。 Especially, I would like to know if there is any way to vectorize to make it faster rather than making an empty dataframe. 特别是,我想知道是否有任何方法可以向量化以使其更快,而不是制作一个空的数据帧。

big[1,1] shows a list of json string looks like below big [1,1]显示json字符串列表,如下所示

"[{\"latitude\":3750772,\"longitude\":12714673},
{\"latitude\":3750957,\"longitude\":12714793},
{\"latitude\":3751111,\"longitude\":12714954},
{\"latitude\":3751215,\"longitude\":12715155},
{\"latitude\":3751174,\"longitude\":12715295},
{\"latitude\":3751153,\"longitude\":12715174}]"

fromJSONbig[1,1] shows a 6 x 2 dataframe. fromJSONbig [1,1]显示6 x 2数据帧。

library(jsonlite)
library(plyr)
big=fromJSON('RT_data_this_should_be_used_for_rt_analysis.json')
big[1,1]
fromJSON(big[1,1]) #It is a 6 x 2 dataframe
row=nrow(big) #Number of row which also means number of 'rt's

result=data.frame(latitude=integer(), longitude=integer()) #Make an empty dataframe which will store values
for (i in 1:row){
  result=rbind.fill(result,fromJSON(big[i,1])) #Bind the dataframes
}
result[,1]=result[,1]/100000
result[,2]=result[,2]/100000 #Adjust longitude and latitude
result #It would be 6*row x 2 dataframe

This is untested, but maybe something similar would work: 这未经测试,但也许可以使用类似的方法:

result_list <- lapply(big[, 1], "fromJSON")
result <- do.call("rbind.fill", result_list)

Probably not the most elegant answer, but you could read the arrays into a list and then use the reduce function to bind all the rows together. 可能不是最优雅的答案,但是您可以将数组读入列表,然后使用reduce函数将所有行绑定在一起。


resultlist <- vector(list, row)

for (i in 1:row){
  resultlist[[i]]= fromJSON(big[i,1]))
}

result <- reduce(resultlist, rbind.fill)

I expect this should be way faster since the dataframe is not being enlarged in every loop. 我希望这应该更快一些,因为不会在每个循环中都放大数据帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM