简体   繁体   中英

dplyr::mutate: temporary expensive variable as input to several other operations, rowwise

It is a little tricky to show my problem with real data but I hope the following explains:

data_frame(a=c(1,2), b=c(3,4)) %>% 
rowwise %>% 
mutate(c = a*b, d = c-1, e=c+2) %>% 
ungroup

In the above example of course the rowwise is not needed.

Now lets suppose that the calculation to make c is both time consuming, c is a large object and not vectorized. So you don't want to have to execute it twice and you want it to be cleared from the memory after each row calculation happens.

Is there a clever way to do this? Perhaps with purrr::map ?

Here is an answer using purrr s invoke_rows .

library(purrr)

MyDf<-data.frame(a=c(1,2), b=c(3,4))
invoke_rows(.d=MyDf, .f=function(a,b){c=a*b
c(d=c-1,
e=c+2)},
.collate="cols")

Update

In response to the comment of @JanStanstrup, if you have another column that you want as part of the output but does not appear in the calculation, you can do this:

MyDf<-data.frame(a=c(1,2), b=c(3,4), dummy=c(6,7))
invoke_rows(.d=MyDf, .f=function(a,b,...){c=a*b
c(d=c-1,
  e=c+2)},
.collate="cols")

Here, dummy and any other columns are passed via the ... as an argument to the .f function, but are not used in that function, so they just gets passed on along.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM