Here is a simplified version of the data I am working with:
data.frame(country = c("country1", "country2", "country3", "country1", "country2"), measurement = c("m1", "m1", "m1", "m2", "m2"), y2015 = c(NA, 15, 19, 13, 55), y2016 = c(NA, 17, NA, 10, NA), y2017 = c(14, NA, NA, 9, 45), y2018 = c(18, 22, 16, NA, 40))
I am trying to take the difference between the two non-missing variables on either side of the NAs, and replace the missing values with the average of the differences over time.
For row 5, this would be something like c(55, 50 , 45, 40 ).
However, it also needs to work for the rows that have more than one missing value in a sequence, like row 1 and row 3. For row 1, I'd like the difference between 14 and 18 to be interpolated, and so it should look something like c( 6 , 10 , 14, 18). Meanwhile, for row 3, the difference between 19-13 divided between the two missing years, to look something like c(19, 18 , 17 , 16).
Essentially, I'm looking to create a slope for each country and measurement through the available years, and interpolating missing variables based on that.
I am trying to think of a package for this or perhaps create a loop. I have looked at the package 'spline' but does not seem to work since I want to run separate linear interpolation based on country and measurement.
Any thoughts would be greatly appreciated!
Use zoo::na.spline
:
library(zoo)
dat[-c(1:2)] <- t(na.spline(t(dat[-c(1:2)])))
country measurement y2015 y2016 y2017 y2018
1 country1 m1 6 10 14.00000 18
2 country2 m1 15 17 19.33333 22
3 country3 m1 19 18 17.00000 16
4 country1 m2 13 10 9.00000 10
5 country2 m2 55 50 45.00000 40
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.