I want to interpolate missing values using dplyr, piping, and spline()
.
Data:
test <- structure(list(site = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("lake", "stream", "wetland"
), class = "factor"), depth = c(0L, -3L, -4L, -8L, -10L, -14L,
0L, -1L, -3L, -5L, 0L, -2L, -4L, -6L), var1 = c(NA, 1L, 3L, NA,
6L, NA, 1L, 2L, NA, 4L, 1L, NA, NA, 4L), var2 = c(1L, NA, 3L,
4L, 8L, NA, NA, NA, NA, NA, NA, 2L, NA, NA)), .Names = c("site",
"depth", "var1", "var2"), class = "data.frame", row.names = c(NA,
-14L))
Q1: How do I use the following functioning code, but limit the range of interpolation to occur between the first non- NA
value and the last non- NA
value for each variable. For example, it should only interpolate var1
for wetland
at depth -8
and return NA
for depths 0
and -14
.
library(tidyverse)
test_int <- test %>%
group_by(site) %>%
mutate_at(vars(c(var1, var2)),
funs("i" = if(sum(!is.na(.)) > 1)
spline(x=depth, y=., xout=depth)[["y"]]
else
NA))
Q2: Is there a way to bound my interpolated values from 0
to Inf
? Or is this not appropriate with spline (eg, I should use another interpolation method such as smooth
or loess
)?
Not pretty, but capable of filtering out the excess values. Side effect is that it filters out interpolated values beyond the min
and max
limits as well.
test_clean <-
test %>%
group_by(site) %>%
mutate_at(vars(c(var1, var2)),
funs(c("c" = if(sum(!is.na(.)) > 1)
spline(x=depth, y=., xout=depth)[["y"]]
else NA),
"min" = min(., na.rm = TRUE),
"max" = max(., na.rm = TRUE)
)
) %>%
mutate(var1_i = if_else(var1_c >= var1_min & var1_c <= var1_max, var1_c, NA_real_),
var2_i = if_else(var2_c >= var2_min & var2_c <= var2_max, var2_c, NA_real_)) %>%
select(site:var2, ends_with("i"))
test_clean
# A tibble: 14 x 6
# Groups: site [3]
site depth var1 var2 var1_i var2_i
<fctr> <int> <int> <int> <dbl> <dbl>
1 wetland 0 NA 1 NA 1.000000
2 wetland -3 1 NA 1.0 3.078125
3 wetland -4 3 3 3.0 3.000000
4 wetland -8 NA 4 NA 4.000000
5 wetland -10 6 8 6.0 8.000000
6 wetland -14 NA NA NA NA
7 lake 0 1 NA 1.0 NA
8 lake -1 2 NA 2.0 NA
9 lake -3 NA NA 3.4 NA
10 lake -5 4 NA 4.0 NA
11 stream 0 1 NA 1.0 NA
12 stream -2 NA 2 2.0 NA
13 stream -4 NA NA 3.0 NA
14 stream -6 4 NA 4.0 NA
and to help everyone working on improving this or proofing the steps that took place on the way to the final dataframe, here's the dataframe with the intermediate steps included:
# A tibble: 14 x 12
# Groups: site [3]
site depth var1 var2 var1_c var2_c var1_min var2_min var1_max var2_max var1_i var2_i
<fctr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 wetland 0 NA 1 -7.5714286 1.000000 1 1 6 8 NA 1.000000
2 wetland -3 1 NA 1.0000000 3.078125 1 1 6 8 1.0 3.078125
3 wetland -4 3 3 3.0000000 3.000000 1 1 6 8 3.0 3.000000
4 wetland -8 NA 4 6.7142857 4.000000 1 1 6 8 NA 4.000000
5 wetland -10 6 8 6.0000000 8.000000 1 1 6 8 6.0 8.000000
6 wetland -14 NA NA -0.5714286 30.750000 1 1 6 8 NA NA
7 lake 0 1 NA 1.0000000 NA 1 Inf 4 -Inf 1.0 NA
8 lake -1 2 NA 2.0000000 NA 1 Inf 4 -Inf 2.0 NA
9 lake -3 NA NA 3.4000000 NA 1 Inf 4 -Inf 3.4 NA
10 lake -5 4 NA 4.0000000 NA 1 Inf 4 -Inf 4.0 NA
11 stream 0 1 NA 1.0000000 NA 1 2 4 2 1.0 NA
12 stream -2 NA 2 2.0000000 NA 1 2 4 2 2.0 NA
13 stream -4 NA NA 3.0000000 NA 1 2 4 2 3.0 NA
14 stream -6 4 NA 4.0000000 NA 1 2 4 2 4.0 NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.