[英]How do I grab the first non-NA column matching a string if a certain column is NA
[英]How do I limit x-range of spline() interpolation to first and last non-NA value in dplyr?
我想使用dplyr,piping和spline()
插入缺失值。
数据:
test <- structure(list(site = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("lake", "stream", "wetland"
), class = "factor"), depth = c(0L, -3L, -4L, -8L, -10L, -14L,
0L, -1L, -3L, -5L, 0L, -2L, -4L, -6L), var1 = c(NA, 1L, 3L, NA,
6L, NA, 1L, 2L, NA, 4L, 1L, NA, NA, 4L), var2 = c(1L, NA, 3L,
4L, 8L, NA, NA, NA, NA, NA, NA, 2L, NA, NA)), .Names = c("site",
"depth", "var1", "var2"), class = "data.frame", row.names = c(NA,
-14L))
Q1:如何使用以下功能代码,但限制在每个变量的第一个非NA
值和最后一个非NA
值之间进行插值的范围。 例如,它应该仅在深度为-8
wetland
内插var1
,为深度0
和-14
返回NA
。
library(tidyverse)
test_int <- test %>%
group_by(site) %>%
mutate_at(vars(c(var1, var2)),
funs("i" = if(sum(!is.na(.)) > 1)
spline(x=depth, y=., xout=depth)[["y"]]
else
NA))
Q2:有没有办法将插值从0
绑定到Inf
? 或者这不适合样条(例如,我应该使用另一种插值方法,如smooth
或loess
)?
不漂亮,但能够过滤掉多余的值。 副作用是它过滤掉超出min
和max
限制的插值。
test_clean <-
test %>%
group_by(site) %>%
mutate_at(vars(c(var1, var2)),
funs(c("c" = if(sum(!is.na(.)) > 1)
spline(x=depth, y=., xout=depth)[["y"]]
else NA),
"min" = min(., na.rm = TRUE),
"max" = max(., na.rm = TRUE)
)
) %>%
mutate(var1_i = if_else(var1_c >= var1_min & var1_c <= var1_max, var1_c, NA_real_),
var2_i = if_else(var2_c >= var2_min & var2_c <= var2_max, var2_c, NA_real_)) %>%
select(site:var2, ends_with("i"))
test_clean
# A tibble: 14 x 6
# Groups: site [3]
site depth var1 var2 var1_i var2_i
<fctr> <int> <int> <int> <dbl> <dbl>
1 wetland 0 NA 1 NA 1.000000
2 wetland -3 1 NA 1.0 3.078125
3 wetland -4 3 3 3.0 3.000000
4 wetland -8 NA 4 NA 4.000000
5 wetland -10 6 8 6.0 8.000000
6 wetland -14 NA NA NA NA
7 lake 0 1 NA 1.0 NA
8 lake -1 2 NA 2.0 NA
9 lake -3 NA NA 3.4 NA
10 lake -5 4 NA 4.0 NA
11 stream 0 1 NA 1.0 NA
12 stream -2 NA 2 2.0 NA
13 stream -4 NA NA 3.0 NA
14 stream -6 4 NA 4.0 NA
并且为了帮助每个人改进这个或校对在最终数据帧的路上发生的步骤,这里是包含中间步骤的数据帧:
# A tibble: 14 x 12
# Groups: site [3]
site depth var1 var2 var1_c var2_c var1_min var2_min var1_max var2_max var1_i var2_i
<fctr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 wetland 0 NA 1 -7.5714286 1.000000 1 1 6 8 NA 1.000000
2 wetland -3 1 NA 1.0000000 3.078125 1 1 6 8 1.0 3.078125
3 wetland -4 3 3 3.0000000 3.000000 1 1 6 8 3.0 3.000000
4 wetland -8 NA 4 6.7142857 4.000000 1 1 6 8 NA 4.000000
5 wetland -10 6 8 6.0000000 8.000000 1 1 6 8 6.0 8.000000
6 wetland -14 NA NA -0.5714286 30.750000 1 1 6 8 NA NA
7 lake 0 1 NA 1.0000000 NA 1 Inf 4 -Inf 1.0 NA
8 lake -1 2 NA 2.0000000 NA 1 Inf 4 -Inf 2.0 NA
9 lake -3 NA NA 3.4000000 NA 1 Inf 4 -Inf 3.4 NA
10 lake -5 4 NA 4.0000000 NA 1 Inf 4 -Inf 4.0 NA
11 stream 0 1 NA 1.0000000 NA 1 2 4 2 1.0 NA
12 stream -2 NA 2 2.0000000 NA 1 2 4 2 2.0 NA
13 stream -4 NA NA 3.0000000 NA 1 2 4 2 3.0 NA
14 stream -6 4 NA 4.0000000 NA 1 2 4 2 4.0 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.