I'm trying to linearly interpolate values within a group using dplyr and approx() Unfortunately, some of the groups have all missing values, so I'd like the approximation to just skip those groups and proceed for the remainder. I don't want to extrapolate or using the nearest neighbouring observation's data.
Here's an example of the data. The first group (by id) has all missing, the other should be interpolated.
data <- read.csv(text="
id,year,value
c1,1998,NA
c1,1999,NA
c1,2000,NA
c1,2001,NA
c2,1998,14
c2,1999,NA
c2,2000,NA
c2,2001,18")
dataIpol <- data %>%
group_by(id) %>%
arrange(id, year) %>%
mutate(valueIpol = approx(year, value, year,
method = "linear", rule = 1, f = 0, ties = mean)$y)
But then I get the error
Error: need at least two non-NA values to interpolate
I don't get this error if I get rid of the groups that have all missing but that's not feasible.
We can fix this by adding a filter
step with the required number of data points:
library(dplyr)
dataIpol <- data %>%
group_by(id) %>%
arrange(id, year) %>%
filter(sum(!is.na(value))>=2) %>% #filter!
mutate(valueIpol = approx(year, value, year,
method = "linear", rule = 1, f = 0, ties = mean)$y)
Here we sum the number of non-NA items in the value column, and remove any groups that do not have >=2
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.