简体   繁体   中英

Looping over grouped data using the nls function in R

I have a grouped dataset. I have my data grouped by GaugeID. I have an nls function that I want to loop over each group and provide an output value.

library(tidyverse)
library(stats)

# sample of data (yearly), first column is gauge (grouping variable), year, then two formula inputs PETvP and ETvP 

# A tibble: 10 x 4
   GaugeID  WATERYR  PETvP  ETvP 
   <chr>      <dbl>  <dbl> <dbl>  
 1 06892000    1981  0.854 0.754 
 2 06892000    1982  0.798 0.708 
 3 06892000    1983  1.12  0.856 
 4 06892000    1984  0.905 0.720  
 5 06892000    1985  0.721 0.618 
 6 06892000    1986  0.717 0.625 
 7 06892000    1987  0.930 0.783 
 8 06892000    1988  1.57  0.945 
 9 06892000    1989  1.15  0.739 
10 06892000    1990  0.933 0.805 
11 08171300    1981  0.854 0.754 
12 08171300    1982  0.798 0.708 
13 08171300    1983  1.12  0.856 
14 08171300    1984  0.905 0.720  
15 08171300    1985  0.721 0.618 
16 08171300    1986  0.717 0.625 
17 08171300    1987  0.930 0.783 
18 08171300    1988  1.57  0.945 
19 08171300    1989  1.15  0.739 
20 08171300    1990  0.933 0.805 

# attempted for loop
for (i in unique(yearly$GaugeID)) {
   myValue = nls(ETvP[i] ~ I(1 + PETvP[i] - (1 + PETvP[i]^(w))^(1/w)), data = yearly,
    start =  list(w = 2), trace = TRUE)
}

I get the following error

Error in model.frame.default(formula = ~ETvP + i + PETvP, data = yearly) : 
  variable lengths differ (found for 'i')

I haven't found much information regarding looping with the nls function. Essentially, I am producing curves and need the value of the curve (w) to output for each gauge. It works if I assign the formula to just one gauge (if I subset the data, ie for the first gauge), but not when I try to use it on the entire data frame with grouped data. For example, this works

# gaugeA 
# A tibble: 10 x 4
   GaugeID  WATERYR  PETvP  ETvP 
   <chr>      <dbl>  <dbl> <dbl>  
 1 06892000    1981  0.854 0.754 
 2 06892000    1982  0.798 0.708 
 3 06892000    1983  1.12  0.856 
 4 06892000    1984  0.905 0.720  
 5 06892000    1985  0.721 0.618 
 6 06892000    1986  0.717 0.625 
 7 06892000    1987  0.930 0.783 
 8 06892000    1988  1.57  0.945 
 9 06892000    1989  1.15  0.739 
10 06892000    1990  0.933 0.805 

test = nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), data = gaugeA, 
    start =  list(w = 2), trace = TRUE)

1.574756    (4.26e+00): par = (2)
0.2649549   (1.46e+00): par = (2.875457)
0.09466832  (3.32e-01): par = (3.59986)
0.08543699  (2.53e-02): par = (3.881397)
0.08538308  (9.49e-05): par = (3.907099)
0.08538308  (1.13e-06): par = (3.907001)
> test
Nonlinear regression model
  model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
   data: gaugeA
    w 
3.907 
 residual sum-of-squares: 0.08538

Number of iterations to convergence: 5 
Achieved convergence tolerance: 1.128e-06

Any ideas on how I can get the subset results for my entire grouped dataframe? It has over 600 different gauges in it. Thank you in advance.

Any of the following will work:

Using summarise :

df %>%
  group_by(GaugeID) %>%
  summarise(result = list(nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), 
                              data = cur_data(), 
                start =  list(w = 2)))) %>%
  pull(result)

[[1]]
Nonlinear regression model
  model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
   data: cur_data()
    w 
3.607 
 residual sum-of-squares: 0.01694

Number of iterations to convergence: 5 
Achieved convergence tolerance: 7.11e-08

[[2]]
Nonlinear regression model
  model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
   data: cur_data()
    w 
1.086 
 residual sum-of-squares: 0.1532

Number of iterations to convergence: 5 
Achieved convergence tolerance: 2.685e-07
    

Using map :

df %>%
  group_split(GaugeID) %>%
  map(~nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), 
           data = .x, 
           start =  list(w = 2)))

I usally prefer purrr and dplyr for looping functions on grouped data. I cant edit the data, but maybe this works:

library(dplyr)
library(purrr)

yearly %>% group_by(GaugeID) %>% summarise(test = nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), data = gaugeA, start =  list(w = 2), trace = TRUE)

A single model can be formulated eliminating loops. Ensure that GaugeID is a factor, subscript w by GaugeID in the formula and provide a starting value list whose w component is a vector with a starting value for each level of GaugeID.

df$GaugeID <- factor(df$GaugeID)
fo <- ETvP ~ 1 + PETvP - (1 + PETvP^(w[GaugeID]))^(1/w[GaugeID])
st <- list(w = rep(2, nlevels(df$GaugeID)))
nls(fo, df, start = st)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM