繁体   English   中英

是否有一个 R function 可以通过在缺少某些年份时在国家/地区分组来帮助将变量滞后一年?

[英]Is there an R function that can help lag a variable by one year by grouping in country when some years are missing?

我搜索了论坛,并没有找到我问题的确切答案。 我有世界银行的数据集

library(wbstats)
Gini <- wb(indicator = c("SI.POV.GINI"),
                     startdate = 2005, enddate = 2020)
Gini <- Gini[,c("iso3c", "date", "value")]
names(Gini)
names(Gini)<-c("iso3c", "date", "Gini")
#Change date to numeric
class(Gini$date)
Gini$date<-as.numeric(Gini$date)

#Tibble:
# A tibble: 1,012 x 3
   iso3c  date  Gini
   <chr> <dbl> <dbl>
 1 ALB    2017  33.2
 2 ALB    2016  33.7
 3 ALB    2015  32.9
 4 ALB    2014  34.6
 5 ALB    2012  29  
 6 ALB    2008  30  
 7 ALB    2005  30.6
 8 DZA    2011  27.6
 9 AGO    2018  51.3
10 AGO    2008  42.7
# … with 1,002 more rows

然后我尝试将这个估计滞后一年

#Lag Gini
lg <- function(x)c(NA, x[1:(length(x)-1)])

Lagged.Gini<-ddply(Gini, ~ iso3c, transform, Gini.lag.1 = lg(Gini))

tibble(Lagged.Gini)

# A tibble: 1,032 x 4
   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
 2 AGO    2008  42.7       51.3
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
 7 ALB    2012  29         34.6
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA  

不幸的是,我的问题是,当缺少年份时,滞后不会认识到缺少那一年,而只是将最近的一年作为滞后。 例如:国家“ALB”的基尼估计值在 2012 年没有滞后一年,而是滞后于下一年,即 2008 年。

我希望最终数据看起来相同,但我在下面编辑的方式 - 理想情况下能够滞后多年:

# A tibble: 1,032 x 4

   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
   AGO   2017   NA        51.3
 2 AGO    2008  42.7       NA
   AGO    2007  NA        42.7
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
   ALB    2013   NA         29
 7 ALB    2012  29         NA
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA  

pseudospin 的答案非常适合基础 R。 由于您使用的是 tibbles,因此这里有一个具有相同效果的 tidyverse 版本:

Gini <- readr::read_table("
iso3c  date  Gini
ALB    2017  33.2
ALB    2016  33.7
ALB    2015  32.9
ALB    2014  34.6
ALB    2012  29  
ALB    2008  30  
ALB    2005  30.6
DZA    2011  27.6
AGO    2018  51.3
AGO    2008  42.7")

library(dplyr)
Gini %>%
  transmute(iso3c, date = date - 1, Gini.lag.1 = Gini) %>%
  full_join(Gini, ., by = c("iso3c", "date")) %>%
  arrange(iso3c, desc(date))
# # A tibble: 17 x 4
#    iso3c  date  Gini Gini.lag.1
#    <chr> <dbl> <dbl>      <dbl>
#  1 AGO    2018  51.3       NA  
#  2 AGO    2017  NA         51.3
#  3 AGO    2008  42.7       NA  
#  4 AGO    2007  NA         42.7
#  5 ALB    2017  33.2       NA  
#  6 ALB    2016  33.7       33.2
#  7 ALB    2015  32.9       33.7
#  8 ALB    2014  34.6       32.9
#  9 ALB    2013  NA         34.6
# 10 ALB    2012  29         NA  
# 11 ALB    2011  NA         29  
# 12 ALB    2008  30         NA  
# 13 ALB    2007  NA         30  
# 14 ALB    2005  30.6       NA  
# 15 ALB    2004  NA         30.6
# 16 DZA    2011  27.6       NA  
# 17 DZA    2010  NA         27.6

如果您需要这样做n次(每次多延迟一次),您可以通过以下方式以编程方式扩展它:

Ginilags <- lapply(1:3, function(lg) {
  z <- transmute(Gini, iso3c, date = date - lg, Gini)
  names(z)[3] <- paste0("Gini.lag.", lg)
  z
})
Reduce(function(a,b) full_join(a, b, by = c("iso3c", "date")),
       c(list(Gini), Ginilags)) %>%
  arrange(iso3c, desc(date))
# # A tibble: 28 x 6
#    iso3c  date  Gini Gini.lag.1 Gini.lag.2 Gini.lag.3
#    <chr> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
#  1 AGO    2018  51.3       NA         NA         NA  
#  2 AGO    2017  NA         51.3       NA         NA  
#  3 AGO    2016  NA         NA         51.3       NA  
#  4 AGO    2015  NA         NA         NA         51.3
#  5 AGO    2008  42.7       NA         NA         NA  
#  6 AGO    2007  NA         42.7       NA         NA  
#  7 AGO    2006  NA         NA         42.7       NA  
#  8 AGO    2005  NA         NA         NA         42.7
#  9 ALB    2017  33.2       NA         NA         NA  
# 10 ALB    2016  33.7       33.2       NA         NA  
# # ... with 18 more rows

您可以创建原始表的副本,但减去一年的日期。 然后只需在iso3cdate列上将两者连接在一起即可获得所需的最终结果。

像这样

Gini_lagged <- data.frame(
  iso3c = Gini$iso3c, 
  date = Gini$date-1, 
  Gini.lag.1 = Gini$Gini)
merge(Gini,Gini_lagged,all=TRUE)

使用来自 tidyverse 的 dplyr 和 tidyr,您可以执行逐行变异以查找与当前行中的年份减 1 匹配的年份。

library(tidyverse)

Gini %>%
     rowwise() %>%
     mutate(Gini.lag.1 = list(Gini$Gini[date-1 == Gini$date])) %>%
     unnest(c(Gini.lag.1), keep_empty = T)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM