[英]Parallelize user-defined function using apply family in R
I have a script that takes too long to compute and I'm trying to paralellize its execution. 我有一个脚本,计算时间太长,并且正在尝试对其执行进行并行处理。
The script basically loops through each row of a data frame and perform some calculations as shown below: 该脚本基本上遍历数据帧的每一行并执行一些计算,如下所示:
my.df = data.frame(id=1:9,value=11:19)
sumPrevious <- function(df,df.id){
sum(df[df$id<=df.id,"value"])
}
for(i in 1:nrow(my.df)){
print(sumPrevious(my.df,my.df[i,"id"]))
}
I'm starting to learn to parallelize code in R, this is why I first want to understand how I could do this with an apply-like function (eg sapply,lapply,mapply). 我开始学习在R中并行化代码,这就是为什么我首先想了解如何使用类似于app的函数(例如sapply,lapply,mapply)来做到这一点。
I've tried multiple things but nothing worked so far: 我已经尝试了多种方法,但到目前为止没有任何效果:
mapply(sumPrevious,my.df,my.df$id) # Error in df$id : $ operator is invalid for atomic vectors
Using the parallel
package in R you can use the mclapply()
function. 使用R中的
parallel
包,可以使用mclapply()
函数。 You will need to adjust your code a little bit to make it run in parallel. 您将需要稍微调整代码以使其并行运行。
library(parallel)
my.df = data.frame(id=1:9,value=11:19)
sumPrevious <- function(i,df){df.id = df$id[i]
sum(df[df$id<=df.id,"value"])
}
mclapply(X = 1:nrow(my.df),FUN = sumPrevious,my.df,mc.preschedule = T,mc.cores = no.of.cores)
This code will run the sumPrevious in parallel on no.of.cores
in your machine. 此代码将并行运行的sumPrevious
no.of.cores
在你的机器。
Well, this is fun playing with. 好吧,这很有趣。 you kind need something like below:
您需要以下内容:
mapply(sumPrevious,list(my.df),my.df$id)
For supply, since the first input is the dataframe, you will have to define a given function for it to be ale to recognize it so: 对于供应,由于第一个输入是数据框,因此您必须定义一个给定的函数以使其能够识别出以下内容:
sapply(my.df$id,function(x,y) sumPrevious(y,x),my.df)
I prefer mapply here since we can set the first value to be imputed as the dataframe directly. 我在这里更喜欢mapply,因为我们可以直接将第一个值设置为数据帧。 But the whole of the dataframe.
但是整个数据帧。 That's why you have to use the function
list
. 这就是为什么必须使用功能
list
。
Map
ia a wrapper of mapply
and thus would just present the solution in a list format. Map
是mapply
的包装,因此只能以列表格式显示解决方案。 try it. 试试吧。 Also
lapply
is similar to sapply
only that sapply
would have to simplify the results into an array format while lapply
would give the same results as a list. 也
lapply
类似于sapply
仅sapply
将不得不结果简化成阵列形式而lapply
将给出相同的结果的列表。
Though it seems whatever you are trying to do can simply be done by a cumsum
function. 尽管看起来您想做的任何事情都可以简单地通过
cumsum
函数完成。
cumsum(df$values)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.