[英]Repeat rows based on time values split across multiple columns - R
我正在嘗試根據月份和年份值重復行。
目前,我的 df 看起來像這樣:
Country Date Year Month
Angola 1/2008 2008 1
Angola 6/2020 2020 6
Benin 1/2013 2013 1
Benin 6/2020 2020 6
Benin 7/2014 2014 7
對於每個國家/地區,我想重復觀察結果,使 df 看起來像這樣:
Country Year Month
Angola 2008 1
Angola 2008 2
Angola 2008 3
Angola 2008 4
Angola 2008 5
Angola 2008 6
etc... all the way until 06/2020 for Angola
有一個非常優雅的解決方案可以根據值重復行( 來自這篇文章)。 如果我僅根據年份重復行,則解決方案的語法將如下所示:
df<-df %>%
mutate(Year = readr::parse_number(Year)) %>%
group_by(Country) %>%
complete(Year =min(Year):max(Year))
但是,我想重復的時間框架不僅基於年份,還基於月份。 我還沒有找到一個很好的方法來調整這種語法來做到這一點。 我嘗試將Date
變量解析為日期,然后根據該日期重復,但這會為變量分配一個日期,並且重復行的次數遠遠超過我需要的次數。
df<-df %>%
mutate(Date = readr::parse_datetime(Date)) %>%
group_by(Country) %>%
complete(Date =min(Date):max(Date))
關於如何做到這一點的任何想法? 更願意調整我一直在嘗試的語法,但也對新的可能性持開放態度
我們刪除Date
列,在按“國家”分組后,使用complete
的“年”和“月”序列
library(dplyr)
out <- df1 %>%
select(-Date) %>%
mutate(Month2 = Month) %>%
group_by(Country) %>%
complete(Year = min(Year):max(Year), Month = first(Month):12) %>%
fill(Month2) %>%
filter(Year == max(Year) & Month <= last(Month2)| Year != max(Year)) %>%
select(-Month2)
out
# A tibble: 240 x 3
# Groups: Country [2]
# Country Year Month
# <chr> <int> <int>
# 1 Angola 2008 1
# 2 Angola 2008 2
# 3 Angola 2008 3
# 4 Angola 2008 4
# 5 Angola 2008 5
# 6 Angola 2008 6
# 7 Angola 2008 7
# 8 Angola 2008 8
# 9 Angola 2008 9
#10 Angola 2008 10
# … with 231 more rows
-檢查 output
-頭
out %>%
filter(Country == 'Angola') %>%
head(14)
# A tibble: 14 x 3
# Groups: Country [1]
Country Year Month
<chr> <int> <int>
1 Angola 2008 1
2 Angola 2008 2
3 Angola 2008 3
4 Angola 2008 4
5 Angola 2008 5
6 Angola 2008 6
7 Angola 2008 7
8 Angola 2008 8
9 Angola 2008 9
10 Angola 2008 10
11 Angola 2008 11
12 Angola 2008 12
13 Angola 2009 1
14 Angola 2009 2
-尾巴
out %>%
filter(Country == 'Angola') %>%
tail(10)
# A tibble: 10 x 3
# Groups: Country [1]
Country Year Month
<chr> <int> <int>
1 Angola 2019 9
2 Angola 2019 10
3 Angola 2019 11
4 Angola 2019 12
5 Angola 2020 1
6 Angola 2020 2
7 Angola 2020 3
8 Angola 2020 4
9 Angola 2020 5
10 Angola 2020 6
df1 <- structure(list(Country = c("Angola", "Angola", "Benin", "Benin",
"Benin"), Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"
), Year = c(2008L, 2020L, 2013L, 2020L, 2014L), Month = c(1L,
6L, 1L, 6L, 7L)), class = "data.frame", row.names = c(NA, -5L
))
library(tidyverse)
df <- tibble(
Country = c("Angola", "Angola", "Benin", "Benin", "Benin"),
Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"),
Year = c(2008, 2020, 2013, 2020, 2014),
Month = c(1,6,1,6,7))
df %>%
group_by(Country) %>%
mutate(Date = lubridate::dmy(paste("1", Date))) %>%
select(-Month, - Year) %>%
complete(Date = seq(min(Date), max(Date), by = "months"))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.