简体   繁体   English

R:在dplyr中使用自定义功能

[英]R: using customised function in dplyr

Sample data: 样本数据:

      library(tidyverse)
      set.seed(123)

      dat <- tibble(
        year = rep(1980:2015, each = 100),
        day = rep(200:299, times = 36),
        rain = sample(0:17, size = 100*36,replace = T),
        PETc =  sample(rnorm(100*36)),
        ini.t = rep(10:45, each = 100 ))

I have a function that operates on a DataFrame 我有一个在DataFrame上运行的函数

   my.func <- function(df, initial, thres, upper.limit){

        df$paw <- rep(NA, nrow(df))
        df$aetc <- rep(NA, nrow(df))
        df$sw <- rep(NA, nrow(df))

        for(n in 1:nrow(df)){
          df$paw[n] <- df$rain[n] + initial
          df$aetc[n] <- ifelse(df$paw[n] >= thres, df$PETc[n], (df$paw[n]/thres) * df$PETc[n])
          df$aetc[n] <- ifelse(df$aetc[n] > df$paw[n], df$paw[n], df$aetc[n])
          df$sw[n] <- initial + df$rain[n] - df$aetc[n]
          df$sw[n] <- ifelse(df$sw[n] > upper.limit,upper.limit,ifelse(df$sw[n] < 0, 0,df$sw[n]))
          initial <- df$sw[n]
}
  return(df)
}

thres <- 110 upper.limit <- 200 thres <- 110 upper.limit <- 200

Applying the above function for a single year: 将上述功能应用一年:

        dat.1980 <- dat[dat$year == 1980,]

        my.func(dat.1980, initial = dat.1980$ini.t[1], thres, upper.limit)

How do I apply this function to each year. 每年如何应用此功能。 I thought of using dplyr 我想到了使用dplyr

              dat %>% group_by(year)%>% run my function on each year. 

Also since there are 35 years, there will be 35 dataframes returned. 同样,由于存在35年,因此将返回35个数据帧。 How do I return the bind these data frame row wise? 如何将这些数据帧按行返回绑定?

You were on the right track. 您在正确的轨道上。 do lets you perform functions by group. do可以按组执行功能。

dat %>% 
   group_by(year) %>% 
   do(my.func(., initial = head(.$ini.t, 1), thres, upper.limit))

# Groups: year [36]
    # year   day  rain    PETc ini.t   paw    aetc    sw
   # <int> <int> <int>   <dbl> <int> <dbl>   <dbl> <dbl>
 # 1  1980   200     5  0.968     10  15.0  0.132   14.9
 # 2  1980   201    14  0.413     10  28.9  0.108   28.8
 # 3  1980   202     7 -0.912     10  35.8 -0.296   36.1
 # 4  1980   203    15 -0.337     10  51.1 -0.156   51.2
 # 5  1980   204    16  0.412     10  67.2  0.252   67.0
 # 6  1980   205     0 -0.923     10  67.0 -0.562   67.5
 # 7  1980   206     9  1.17      10  76.5  0.813   75.7
 # 8  1980   207    16  0.0542    10  91.7  0.0452  91.7
 # 9  1980   208     9 -0.293     10 101   -0.268  101  
# 10  1980   209     8  0.0788    10 109    0.0781 109  
# ... with 3,590 more rows

purrr::map functions are the du jour method but I think in this case it's a stylistic choice purrr::map函数是du jour方法,但我认为在这种情况下,这是一种风格选择

We can split by 'year' and then use map to apply the my.func to each of the split datasets in the list 我们可以split的“年”,然后使用map到应用my.func到每个分割数据集的list

library(purrr)
dat %>% 
    split(.$year) %>% 
    map_df(~my.func(.x, initial = .x$ini.t[1], thres, upper.limit))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM