[英]Is there an R function that can help lag a variable by one year by grouping in country when some years are missing?
我搜索了論壇,並沒有找到我問題的確切答案。 我有世界銀行的數據集
library(wbstats)
Gini <- wb(indicator = c("SI.POV.GINI"),
startdate = 2005, enddate = 2020)
Gini <- Gini[,c("iso3c", "date", "value")]
names(Gini)
names(Gini)<-c("iso3c", "date", "Gini")
#Change date to numeric
class(Gini$date)
Gini$date<-as.numeric(Gini$date)
#Tibble:
# A tibble: 1,012 x 3
iso3c date Gini
<chr> <dbl> <dbl>
1 ALB 2017 33.2
2 ALB 2016 33.7
3 ALB 2015 32.9
4 ALB 2014 34.6
5 ALB 2012 29
6 ALB 2008 30
7 ALB 2005 30.6
8 DZA 2011 27.6
9 AGO 2018 51.3
10 AGO 2008 42.7
# … with 1,002 more rows
然后我嘗試將這個估計滯后一年
#Lag Gini
lg <- function(x)c(NA, x[1:(length(x)-1)])
Lagged.Gini<-ddply(Gini, ~ iso3c, transform, Gini.lag.1 = lg(Gini))
tibble(Lagged.Gini)
# A tibble: 1,032 x 4
iso3c date Gini Gini.lag.1
<chr> <dbl> <dbl> <dbl>
1 AGO 2018 51.3 NA
2 AGO 2008 42.7 51.3
3 ALB 2017 33.2 NA
4 ALB 2016 33.7 33.2
5 ALB 2015 32.9 33.7
6 ALB 2014 34.6 32.9
7 ALB 2012 29 34.6
8 ALB 2008 30 29
9 ALB 2005 30.6 30
10 ARE 2014 32.5 NA
不幸的是,我的問題是,當缺少年份時,滯后不會認識到缺少那一年,而只是將最近的一年作為滯后。 例如:國家“ALB”的基尼估計值在 2012 年沒有滯后一年,而是滯后於下一年,即 2008 年。
我希望最終數據看起來相同,但我在下面編輯的方式 - 理想情況下能夠滯后多年:
# A tibble: 1,032 x 4
iso3c date Gini Gini.lag.1
<chr> <dbl> <dbl> <dbl>
1 AGO 2018 51.3 NA
AGO 2017 NA 51.3
2 AGO 2008 42.7 NA
AGO 2007 NA 42.7
3 ALB 2017 33.2 NA
4 ALB 2016 33.7 33.2
5 ALB 2015 32.9 33.7
6 ALB 2014 34.6 32.9
ALB 2013 NA 29
7 ALB 2012 29 NA
8 ALB 2008 30 29
9 ALB 2005 30.6 30
10 ARE 2014 32.5 NA
pseudospin 的答案非常適合基礎 R。 由於您使用的是 tibbles,因此這里有一個具有相同效果的 tidyverse 版本:
Gini <- readr::read_table("
iso3c date Gini
ALB 2017 33.2
ALB 2016 33.7
ALB 2015 32.9
ALB 2014 34.6
ALB 2012 29
ALB 2008 30
ALB 2005 30.6
DZA 2011 27.6
AGO 2018 51.3
AGO 2008 42.7")
library(dplyr)
Gini %>%
transmute(iso3c, date = date - 1, Gini.lag.1 = Gini) %>%
full_join(Gini, ., by = c("iso3c", "date")) %>%
arrange(iso3c, desc(date))
# # A tibble: 17 x 4
# iso3c date Gini Gini.lag.1
# <chr> <dbl> <dbl> <dbl>
# 1 AGO 2018 51.3 NA
# 2 AGO 2017 NA 51.3
# 3 AGO 2008 42.7 NA
# 4 AGO 2007 NA 42.7
# 5 ALB 2017 33.2 NA
# 6 ALB 2016 33.7 33.2
# 7 ALB 2015 32.9 33.7
# 8 ALB 2014 34.6 32.9
# 9 ALB 2013 NA 34.6
# 10 ALB 2012 29 NA
# 11 ALB 2011 NA 29
# 12 ALB 2008 30 NA
# 13 ALB 2007 NA 30
# 14 ALB 2005 30.6 NA
# 15 ALB 2004 NA 30.6
# 16 DZA 2011 27.6 NA
# 17 DZA 2010 NA 27.6
如果您需要這樣做n
次(每次多延遲一次),您可以通過以下方式以編程方式擴展它:
Ginilags <- lapply(1:3, function(lg) {
z <- transmute(Gini, iso3c, date = date - lg, Gini)
names(z)[3] <- paste0("Gini.lag.", lg)
z
})
Reduce(function(a,b) full_join(a, b, by = c("iso3c", "date")),
c(list(Gini), Ginilags)) %>%
arrange(iso3c, desc(date))
# # A tibble: 28 x 6
# iso3c date Gini Gini.lag.1 Gini.lag.2 Gini.lag.3
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AGO 2018 51.3 NA NA NA
# 2 AGO 2017 NA 51.3 NA NA
# 3 AGO 2016 NA NA 51.3 NA
# 4 AGO 2015 NA NA NA 51.3
# 5 AGO 2008 42.7 NA NA NA
# 6 AGO 2007 NA 42.7 NA NA
# 7 AGO 2006 NA NA 42.7 NA
# 8 AGO 2005 NA NA NA 42.7
# 9 ALB 2017 33.2 NA NA NA
# 10 ALB 2016 33.7 33.2 NA NA
# # ... with 18 more rows
您可以創建原始表的副本,但減去一年的日期。 然后只需在iso3c
和date
列上將兩者連接在一起即可獲得所需的最終結果。
像這樣
Gini_lagged <- data.frame(
iso3c = Gini$iso3c,
date = Gini$date-1,
Gini.lag.1 = Gini$Gini)
merge(Gini,Gini_lagged,all=TRUE)
使用來自 tidyverse 的 dplyr 和 tidyr,您可以執行逐行變異以查找與當前行中的年份減 1 匹配的年份。
library(tidyverse)
Gini %>%
rowwise() %>%
mutate(Gini.lag.1 = list(Gini$Gini[date-1 == Gini$date])) %>%
unnest(c(Gini.lag.1), keep_empty = T)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.