简体   繁体   中英

Is there an R function that can help lag a variable by one year by grouping in country when some years are missing?

I have searched through the forums and have not found exactly the answer to my question. I have a data set from the World Bank

library(wbstats)
Gini <- wb(indicator = c("SI.POV.GINI"),
                     startdate = 2005, enddate = 2020)
Gini <- Gini[,c("iso3c", "date", "value")]
names(Gini)
names(Gini)<-c("iso3c", "date", "Gini")
#Change date to numeric
class(Gini$date)
Gini$date<-as.numeric(Gini$date)

#Tibble:
# A tibble: 1,012 x 3
   iso3c  date  Gini
   <chr> <dbl> <dbl>
 1 ALB    2017  33.2
 2 ALB    2016  33.7
 3 ALB    2015  32.9
 4 ALB    2014  34.6
 5 ALB    2012  29  
 6 ALB    2008  30  
 7 ALB    2005  30.6
 8 DZA    2011  27.6
 9 AGO    2018  51.3
10 AGO    2008  42.7
# … with 1,002 more rows

Then I try to lag this estimate by one year

#Lag Gini
lg <- function(x)c(NA, x[1:(length(x)-1)])

Lagged.Gini<-ddply(Gini, ~ iso3c, transform, Gini.lag.1 = lg(Gini))

tibble(Lagged.Gini)

# A tibble: 1,032 x 4
   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
 2 AGO    2008  42.7       51.3
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
 7 ALB    2012  29         34.6
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA  

Unfortunately, my problem is that when years are missing the lag does not recognize that year is missing and just puts the most recent year as the lag. Ex: country "ALB"'s Gini estimate is not lagged by one year in 2012 it lags to the next year which is 2008.

I would want the final data to look the same but how I edited below -- and ideally to be able to lag for multiple years:

# A tibble: 1,032 x 4

   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
   AGO   2017   NA        51.3
 2 AGO    2008  42.7       NA
   AGO    2007  NA        42.7
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
   ALB    2013   NA         29
 7 ALB    2012  29         NA
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA  

pseudospin's answer is great for base R. Since you're using tibbles, here's a tidyverse version with the same effect:

Gini <- readr::read_table("
iso3c  date  Gini
ALB    2017  33.2
ALB    2016  33.7
ALB    2015  32.9
ALB    2014  34.6
ALB    2012  29  
ALB    2008  30  
ALB    2005  30.6
DZA    2011  27.6
AGO    2018  51.3
AGO    2008  42.7")

library(dplyr)
Gini %>%
  transmute(iso3c, date = date - 1, Gini.lag.1 = Gini) %>%
  full_join(Gini, ., by = c("iso3c", "date")) %>%
  arrange(iso3c, desc(date))
# # A tibble: 17 x 4
#    iso3c  date  Gini Gini.lag.1
#    <chr> <dbl> <dbl>      <dbl>
#  1 AGO    2018  51.3       NA  
#  2 AGO    2017  NA         51.3
#  3 AGO    2008  42.7       NA  
#  4 AGO    2007  NA         42.7
#  5 ALB    2017  33.2       NA  
#  6 ALB    2016  33.7       33.2
#  7 ALB    2015  32.9       33.7
#  8 ALB    2014  34.6       32.9
#  9 ALB    2013  NA         34.6
# 10 ALB    2012  29         NA  
# 11 ALB    2011  NA         29  
# 12 ALB    2008  30         NA  
# 13 ALB    2007  NA         30  
# 14 ALB    2005  30.6       NA  
# 15 ALB    2004  NA         30.6
# 16 DZA    2011  27.6       NA  
# 17 DZA    2010  NA         27.6

If you need to do this n times (one more lag each time), you can extend it programmatically this way:

Ginilags <- lapply(1:3, function(lg) {
  z <- transmute(Gini, iso3c, date = date - lg, Gini)
  names(z)[3] <- paste0("Gini.lag.", lg)
  z
})
Reduce(function(a,b) full_join(a, b, by = c("iso3c", "date")),
       c(list(Gini), Ginilags)) %>%
  arrange(iso3c, desc(date))
# # A tibble: 28 x 6
#    iso3c  date  Gini Gini.lag.1 Gini.lag.2 Gini.lag.3
#    <chr> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
#  1 AGO    2018  51.3       NA         NA         NA  
#  2 AGO    2017  NA         51.3       NA         NA  
#  3 AGO    2016  NA         NA         51.3       NA  
#  4 AGO    2015  NA         NA         NA         51.3
#  5 AGO    2008  42.7       NA         NA         NA  
#  6 AGO    2007  NA         42.7       NA         NA  
#  7 AGO    2006  NA         NA         42.7       NA  
#  8 AGO    2005  NA         NA         NA         42.7
#  9 ALB    2017  33.2       NA         NA         NA  
# 10 ALB    2016  33.7       33.2       NA         NA  
# # ... with 18 more rows

You could create a copy of the original table, but with the date having one year subtracted off. Then just join the two together on the iso3c and date columns to get the final result as you want it.

Like this

Gini_lagged <- data.frame(
  iso3c = Gini$iso3c, 
  date = Gini$date-1, 
  Gini.lag.1 = Gini$Gini)
merge(Gini,Gini_lagged,all=TRUE)

Using dplyr & tidyr from tidyverse, you can do a rowwise mutate to find years which match the year in the current row minus 1.

library(tidyverse)

Gini %>%
     rowwise() %>%
     mutate(Gini.lag.1 = list(Gini$Gini[date-1 == Gini$date])) %>%
     unnest(c(Gini.lag.1), keep_empty = T)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM