简体   繁体   中英

How do I limit x-range of spline() interpolation to first and last non-NA value in dplyr?

I want to interpolate missing values using dplyr, piping, and spline() .

Data:

test <- structure(list(site = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("lake", "stream", "wetland"
    ), class = "factor"), depth = c(0L, -3L, -4L, -8L, -10L, -14L, 
    0L, -1L, -3L, -5L, 0L, -2L, -4L, -6L), var1 = c(NA, 1L, 3L, NA, 
    6L, NA, 1L, 2L, NA, 4L, 1L, NA, NA, 4L), var2 = c(1L, NA, 3L, 
    4L, 8L, NA, NA, NA, NA, NA, NA, 2L, NA, NA)), .Names = c("site", 
    "depth", "var1", "var2"), class = "data.frame", row.names = c(NA, 
    -14L))

Q1: How do I use the following functioning code, but limit the range of interpolation to occur between the first non- NA value and the last non- NA value for each variable. For example, it should only interpolate var1 for wetland at depth -8 and return NA for depths 0 and -14 .

library(tidyverse)

test_int <- test %>% 
    group_by(site) %>% 
    mutate_at(vars(c(var1, var2)),
              funs("i" = if(sum(!is.na(.)) > 1) 
                             spline(x=depth, y=., xout=depth)[["y"]]
                         else
                             NA)) 

Q2: Is there a way to bound my interpolated values from 0 to Inf ? Or is this not appropriate with spline (eg, I should use another interpolation method such as smooth or loess )?

Not pretty, but capable of filtering out the excess values. Side effect is that it filters out interpolated values beyond the min and max limits as well.

test_clean <- 
    test %>% 
    group_by(site) %>% 
    mutate_at(vars(c(var1, var2)),
              funs(c("c" = if(sum(!is.na(.)) > 1) 
                            spline(x=depth, y=., xout=depth)[["y"]]
                        else NA),
                    "min" = min(., na.rm = TRUE),
                    "max" = max(., na.rm = TRUE)
                   )
              ) %>% 
    mutate(var1_i = if_else(var1_c >= var1_min & var1_c <= var1_max, var1_c, NA_real_),
           var2_i = if_else(var2_c >= var2_min & var2_c <= var2_max, var2_c, NA_real_)) %>% 
    select(site:var2, ends_with("i"))

test_clean
# A tibble: 14 x 6
# Groups:   site [3]
      site depth  var1  var2 var1_i   var2_i
    <fctr> <int> <int> <int>  <dbl>    <dbl>
 1 wetland     0    NA     1     NA 1.000000
 2 wetland    -3     1    NA    1.0 3.078125
 3 wetland    -4     3     3    3.0 3.000000
 4 wetland    -8    NA     4     NA 4.000000
 5 wetland   -10     6     8    6.0 8.000000
 6 wetland   -14    NA    NA     NA       NA
 7    lake     0     1    NA    1.0       NA
 8    lake    -1     2    NA    2.0       NA
 9    lake    -3    NA    NA    3.4       NA
10    lake    -5     4    NA    4.0       NA
11  stream     0     1    NA    1.0       NA
12  stream    -2    NA     2    2.0       NA
13  stream    -4    NA    NA    3.0       NA
14  stream    -6     4    NA    4.0       NA

and to help everyone working on improving this or proofing the steps that took place on the way to the final dataframe, here's the dataframe with the intermediate steps included:

# A tibble: 14 x 12
# Groups:   site [3]
      site depth  var1  var2     var1_c    var2_c var1_min var2_min var1_max var2_max var1_i   var2_i
    <fctr> <int> <int> <int>      <dbl>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>  <dbl>    <dbl>
 1 wetland     0    NA     1 -7.5714286  1.000000        1        1        6        8     NA 1.000000
 2 wetland    -3     1    NA  1.0000000  3.078125        1        1        6        8    1.0 3.078125
 3 wetland    -4     3     3  3.0000000  3.000000        1        1        6        8    3.0 3.000000
 4 wetland    -8    NA     4  6.7142857  4.000000        1        1        6        8     NA 4.000000
 5 wetland   -10     6     8  6.0000000  8.000000        1        1        6        8    6.0 8.000000
 6 wetland   -14    NA    NA -0.5714286 30.750000        1        1        6        8     NA       NA
 7    lake     0     1    NA  1.0000000        NA        1      Inf        4     -Inf    1.0       NA
 8    lake    -1     2    NA  2.0000000        NA        1      Inf        4     -Inf    2.0       NA
 9    lake    -3    NA    NA  3.4000000        NA        1      Inf        4     -Inf    3.4       NA
10    lake    -5     4    NA  4.0000000        NA        1      Inf        4     -Inf    4.0       NA
11  stream     0     1    NA  1.0000000        NA        1        2        4        2    1.0       NA
12  stream    -2    NA     2  2.0000000        NA        1        2        4        2    2.0       NA
13  stream    -4    NA    NA  3.0000000        NA        1        2        4        2    3.0       NA
14  stream    -6     4    NA  4.0000000        NA        1        2        4        2    4.0       NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM