简体   繁体   English

如何使用 group_by function 在 dplyr 中应用预先设计的 function

[英]How to apply a pre-designed function in dplyr with group_by function

I have a pre-designed function like below, which I checked and work well.我有一个预先设计的 function,如下所示,我检查并运行良好。

foo <- function(tmp2) {
  
tmp2[,"frontier_dummy"] <- 0
A=tmp2[1,"sd_R"] # minimum sd
tmp2[1,"frontier_dummy"] <- 1

for (i in 2:nrow(tmp2)) {
  
  # check whether sd_i < A
  if(tmp2[i,"sd_R"]<A){
    tmp2[i,"frontier_dummy"] <- 1
    A <- tmp2[i, "sd_R"]
  }
}
return(tmp2)
}

I would like to apply this function to dplyr together with a group_by function.我想将此 function 与 group_by function 一起应用于 dplyr。 I have my code like below:我的代码如下:

trial2= tmp2%>% group_by(subset) %>% arrange(desc(mean_R),desc(sd_R)) %>%
  foo()

it works but when I checked the output, it does not work as separate the data into subsets then run the function for each subset.它可以工作,但是当我检查 output 时,它不能将数据分成子集,然后为每个子集运行 function。 Can anyone help my figure out why?谁能帮我弄清楚为什么? How can I modify my code?如何修改我的代码?

Thanks a lot!!!!!!非常感谢!!!!!!

the data:数据:

,id,mean_R,Var_R,sd_R,mean_over_sd,mean_ROI,subset 1,11813,3385.833333,3868920.967,1966.957286,1.7213558,55832.47936,3 2,4049,2150.625,4000830.839,2000.207699,1.075200841,67073.8136,6 3,11432,1959.4,2508571.822,1583.847159,1.23711432,69286.36564,4 4,15166,1600.357143,13464947.17,3669.461428,0.436128618,280618.3547,3 5,12061,1509.5,44193,210.221312,7.180527921,25810.03176,3 6,7749,1452.4,297037.3,545.0112843,2.664898951,71970.11657,2 7,10711,1433.461538,14059975.44,3749.663376,0.382290727,131054.4251,2 8,3068,1252.25,333918.25,577.8565999,2.167060133,42896.49156,4 9,11335,1111.125,133857.8393,365.8658761,3.036973581,61310.80272,2 10,5770,692.8,196306.1778,443.06453,1.563654847,59234.55409,2 11,10089,679.375,56943.58333,238.6285468,2.846998019,60651.76025,1 12,10674,674.6666667,241327.8667,491.2513274,1.373363549,24164.31565,2 13,11435,531.8333333,669476.5667,818.2154769,0.649991779,11331.40683,2 14,19957,518.16,314590.14,560.8833569,0.923828446,70713.39092,1 15,22841,430.2,114384.0833,3 ,id,mean_R,Var_R,sd_R,mean_over_sd,mean_ROI,subset 1,11813,3385.833333,3868920.967,1966.957286,1.7213558,55832.47936,3 2,4049,2150.625,4000830.839,2000.207699,1.075200841,67073.8136,6 3,11432,1959.4, 2508571.822,1583.847159,1.23711432,69286.36564,4 4,15166,1600.357143,13464947.17,3669.461428,0.436128618,280618.3547,3 5,12061,1509.5,44193,210.221312,7.180527921,25810.03176,3 6,7749,1452.4,297037.3,545.0112843,2.664898951 ,71970.11657,2 7,10711,1433.461538,14059975.44,3749.663376,0.382290727,131054.4251,2 8,3068,1252.25,333918.25,577.8565999,2.167060133,42896.49156,4 9,11335,1111.125,133857.8393,365.8658761,3.036973581,61310.80272,2 10 ,5770,692.8,196306.1778,443.06453,1.563654847,59234.55409,2 11,10089,679.375,56943.58333,238.6285468,2.846998019,60651.76025,1 12,10674,674.6666667,241327.8667,491.2513274,1.373363549,24164.31565,2 13,11435,531.8333333, 669476.5667,818.2154769,0.649991779,11331.40683,2 14,19957,518.16,314590.14,560.8833569,0.923828446,70713.39092,1 13,313,22848,48341,4834 38.2071604,1.272001455,49212.42332,2 16,10180,417.4615385,18061.4359,134.3928417,3.106278082,62303.42163,1 17,4390,326,32257.33333,179.6032665,1.815111754,17219.19576,2 18,15514,227,5875.333333,76.65072298,2.961485439,30676.16867,3 19,17619,212,57981.42857,240.7933317,0.880423052,57932.1208,1 38.2071604,1.272001455,49212.42332,2 16,10180,417.4615385,18061.4359,134.3928417,3.106278082,62303.42163,1 17,4390,326,32257.33333,179.6032665,1.815111754,17219.19576,2 18,15514,227,5875.333333,76.65072298,2.961485439,30676.16867 ,3 19,17619,212,57981.42857,240.7933317,0.880423052,57932.1208,1

With dplyr (or even base R) there should be a better way to write the foo function.使用dplyr (甚至基数 R)应该有更好的方法来编写foo function。 However, since you haven't shared your data and you haven't shared what exactly is happening in foo we keep the foo function untouched and change the way in which we apply the function.但是,由于您没有共享您的数据并且您没有共享foo中究竟发生了什么,我们保持foo function 不变,并更改我们应用 function 的方式。

You can use group_split to split the data into different dataframes based on unique values in subset and apply foo to each dataframe using map .您可以使用group_split根据subset的唯一值将数据拆分为不同的数据帧,并使用map apply foo应用于每个 dataframe 。

library(dplyr)
library(purrr)

tmp2%>% 
  arrange(desc(mean_R),desc(sd_R)) %>%
  group_split(subset) %>% 
  map_df(foo) -> result

result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM