如何将样条插值的x范围限制为dplyr中的第一个和最后一个非NA值？

Question

I want to interpolate missing values using dplyr, piping, and spline() . 我想使用dplyr，piping和spline()插入缺失值。

Data: 数据：

test <- structure(list(site = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("lake", "stream", "wetland"
    ), class = "factor"), depth = c(0L, -3L, -4L, -8L, -10L, -14L, 
    0L, -1L, -3L, -5L, 0L, -2L, -4L, -6L), var1 = c(NA, 1L, 3L, NA, 
    6L, NA, 1L, 2L, NA, 4L, 1L, NA, NA, 4L), var2 = c(1L, NA, 3L, 
    4L, 8L, NA, NA, NA, NA, NA, NA, 2L, NA, NA)), .Names = c("site", 
    "depth", "var1", "var2"), class = "data.frame", row.names = c(NA, 
    -14L))

Q1: How do I use the following functioning code, but limit the range of interpolation to occur between the first non- NA value and the last non- NA value for each variable. Q1：如何使用以下功能代码，但限制在每个变量的第一个非NA值和最后一个非NA值之间进行插值的范围。 For example, it should only interpolate var1 for wetland at depth -8 and return NA for depths 0 and -14 . 例如，它应该仅在深度为-8 wetland内插var1 ，为深度0和-14返回NA 。

library(tidyverse)

test_int <- test %>% 
    group_by(site) %>% 
    mutate_at(vars(c(var1, var2)),
              funs("i" = if(sum(!is.na(.)) > 1) 
                             spline(x=depth, y=., xout=depth)[["y"]]
                         else
                             NA))

Q2: Is there a way to bound my interpolated values from 0 to Inf ? Q2：有没有办法将插值从0绑定到Inf ？ Or is this not appropriate with spline (eg, I should use another interpolation method such as smooth or loess )? 或者这不适合样条（例如，我应该使用另一种插值方法，如smooth或loess ）？

Answer 1

Not pretty, but capable of filtering out the excess values. 不漂亮，但能够过滤掉多余的值。 Side effect is that it filters out interpolated values beyond the min and max limits as well. 副作用是它过滤掉超出min和max限制的插值。

test_clean <- 
    test %>% 
    group_by(site) %>% 
    mutate_at(vars(c(var1, var2)),
              funs(c("c" = if(sum(!is.na(.)) > 1) 
                            spline(x=depth, y=., xout=depth)[["y"]]
                        else NA),
                    "min" = min(., na.rm = TRUE),
                    "max" = max(., na.rm = TRUE)
                   )
              ) %>% 
    mutate(var1_i = if_else(var1_c >= var1_min & var1_c <= var1_max, var1_c, NA_real_),
           var2_i = if_else(var2_c >= var2_min & var2_c <= var2_max, var2_c, NA_real_)) %>% 
    select(site:var2, ends_with("i"))

test_clean
# A tibble: 14 x 6
# Groups:   site [3]
      site depth  var1  var2 var1_i   var2_i
    <fctr> <int> <int> <int>  <dbl>    <dbl>
 1 wetland     0    NA     1     NA 1.000000
 2 wetland    -3     1    NA    1.0 3.078125
 3 wetland    -4     3     3    3.0 3.000000
 4 wetland    -8    NA     4     NA 4.000000
 5 wetland   -10     6     8    6.0 8.000000
 6 wetland   -14    NA    NA     NA       NA
 7    lake     0     1    NA    1.0       NA
 8    lake    -1     2    NA    2.0       NA
 9    lake    -3    NA    NA    3.4       NA
10    lake    -5     4    NA    4.0       NA
11  stream     0     1    NA    1.0       NA
12  stream    -2    NA     2    2.0       NA
13  stream    -4    NA    NA    3.0       NA
14  stream    -6     4    NA    4.0       NA

and to help everyone working on improving this or proofing the steps that took place on the way to the final dataframe, here's the dataframe with the intermediate steps included: 并且为了帮助每个人改进这个或校对在最终数据帧的路上发生的步骤，这里是包含中间步骤的数据帧：

# A tibble: 14 x 12
# Groups:   site [3]
      site depth  var1  var2     var1_c    var2_c var1_min var2_min var1_max var2_max var1_i   var2_i
    <fctr> <int> <int> <int>      <dbl>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>  <dbl>    <dbl>
 1 wetland     0    NA     1 -7.5714286  1.000000        1        1        6        8     NA 1.000000
 2 wetland    -3     1    NA  1.0000000  3.078125        1        1        6        8    1.0 3.078125
 3 wetland    -4     3     3  3.0000000  3.000000        1        1        6        8    3.0 3.000000
 4 wetland    -8    NA     4  6.7142857  4.000000        1        1        6        8     NA 4.000000
 5 wetland   -10     6     8  6.0000000  8.000000        1        1        6        8    6.0 8.000000
 6 wetland   -14    NA    NA -0.5714286 30.750000        1        1        6        8     NA       NA
 7    lake     0     1    NA  1.0000000        NA        1      Inf        4     -Inf    1.0       NA
 8    lake    -1     2    NA  2.0000000        NA        1      Inf        4     -Inf    2.0       NA
 9    lake    -3    NA    NA  3.4000000        NA        1      Inf        4     -Inf    3.4       NA
10    lake    -5     4    NA  4.0000000        NA        1      Inf        4     -Inf    4.0       NA
11  stream     0     1    NA  1.0000000        NA        1        2        4        2    1.0       NA
12  stream    -2    NA     2  2.0000000        NA        1        2        4        2    2.0       NA
13  stream    -4    NA    NA  3.0000000        NA        1        2        4        2    3.0       NA
14  stream    -6     4    NA  4.0000000        NA        1        2        4        2    4.0       NA

如何将样条插值的x范围限制为dplyr中的第一个和最后一个非NA值？

问题描述

1 个解决方案

解决方案1
0 2017-10-26 18:55:29

如何将样条插值的x范围限制为dplyr中的第一个和最后一个非NA值？

问题描述

1 个解决方案

解决方案1 0 2017-10-26 18:55:29

解决方案1
0 2017-10-26 18:55:29