简体   繁体   中英

linear interpolation with dplyr but skipping groups with all missing values

I'm trying to linearly interpolate values within a group using dplyr and approx() Unfortunately, some of the groups have all missing values, so I'd like the approximation to just skip those groups and proceed for the remainder. I don't want to extrapolate or using the nearest neighbouring observation's data.

Here's an example of the data. The first group (by id) has all missing, the other should be interpolated.

data <- read.csv(text="
id,year,value
c1,1998,NA
c1,1999,NA
c1,2000,NA
c1,2001,NA
c2,1998,14
c2,1999,NA
c2,2000,NA
c2,2001,18")

dataIpol <- data %>%
group_by(id) %>% 
arrange(id, year) %>%            
mutate(valueIpol = approx(year, value, year, 
                 method = "linear", rule = 1, f = 0, ties = mean)$y)

But then I get the error

Error: need at least two non-NA values to interpolate

I don't get this error if I get rid of the groups that have all missing but that's not feasible.

We can fix this by adding a filter step with the required number of data points:

library(dplyr)
dataIpol <- data %>%
  group_by(id) %>% 
  arrange(id, year) %>%
  filter(sum(!is.na(value))>=2) %>% #filter!
  mutate(valueIpol = approx(year, value, year, 
                            method = "linear", rule = 1, f = 0, ties = mean)$y)

Here we sum the number of non-NA items in the value column, and remove any groups that do not have >=2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM