簡體   English   中英

根據跨多列拆分的時間值重復行 - R

[英]Repeat rows based on time values split across multiple columns - R

我正在嘗試根據月份和年份值重復行。

目前,我的 df 看起來像這樣:

Country Date    Year   Month
Angola  1/2008  2008    1
Angola  6/2020  2020    6
Benin   1/2013  2013    1
Benin   6/2020  2020    6
Benin   7/2014  2014    7

對於每個國家/地區,我想重復觀察結果,使 df 看起來像這樣:

Country Year   Month
Angola  2008    1
Angola  2008    2
Angola  2008    3
Angola  2008    4
Angola  2008    5
Angola  2008    6

etc... all the way until 06/2020 for Angola

有一個非常優雅的解決方案可以根據值重復行( 來自這篇文章)。 如果我僅根據年份重復行,則解決方案的語法將如下所示:

df<-df %>%
  mutate(Year = readr::parse_number(Year)) %>% 
  group_by(Country)  %>%
  complete(Year =min(Year):max(Year))  

但是,我想重復的時間框架不僅基於年份,還基於月份。 我還沒有找到一個很好的方法來調整這種語法來做到這一點。 我嘗試將Date變量解析為日期,然后根據該日期重復,但這會為變量分配一個日期,並且重復行的次數遠遠超過我需要的次數。

df<-df %>% 
  mutate(Date = readr::parse_datetime(Date)) %>% 
  group_by(Country)  %>%
  complete(Date =min(Date):max(Date))  

關於如何做到這一點的任何想法? 更願意調整我一直在嘗試的語法,但也對新的可能性持開放態度

我們刪除Date列,在按“國家”分組后,使用complete的“年”和“月”序列

library(dplyr)
out <- df1 %>% 
   select(-Date) %>% 
   mutate(Month2 = Month) %>% 
   group_by(Country) %>% 
   complete(Year = min(Year):max(Year), Month = first(Month):12) %>% 
   fill(Month2) %>%
   filter(Year == max(Year) & Month <= last(Month2)| Year != max(Year)) %>%
   select(-Month2)
out
# A tibble: 240 x 3
# Groups:   Country [2]
#   Country  Year Month
#   <chr>   <int> <int>
# 1 Angola   2008     1
# 2 Angola   2008     2
# 3 Angola   2008     3
# 4 Angola   2008     4
# 5 Angola   2008     5
# 6 Angola   2008     6
# 7 Angola   2008     7
# 8 Angola   2008     8
# 9 Angola   2008     9
#10 Angola   2008    10
# … with 231 more rows

-檢查 output

-頭

out %>%
   filter(Country == 'Angola') %>% 
   head(14)
# A tibble: 14 x 3
# Groups:   Country [1]
   Country  Year Month
   <chr>   <int> <int>
 1 Angola   2008     1
 2 Angola   2008     2
 3 Angola   2008     3
 4 Angola   2008     4
 5 Angola   2008     5
 6 Angola   2008     6
 7 Angola   2008     7
 8 Angola   2008     8
 9 Angola   2008     9
10 Angola   2008    10
11 Angola   2008    11
12 Angola   2008    12
13 Angola   2009     1
14 Angola   2009     2

-尾巴

out %>%
   filter(Country == 'Angola') %>% 
   tail(10)
# A tibble: 10 x 3
# Groups:   Country [1]
   Country  Year Month
   <chr>   <int> <int>
 1 Angola   2019     9
 2 Angola   2019    10
 3 Angola   2019    11
 4 Angola   2019    12
 5 Angola   2020     1
 6 Angola   2020     2
 7 Angola   2020     3
 8 Angola   2020     4
 9 Angola   2020     5
10 Angola   2020     6

數據

df1 <- structure(list(Country = c("Angola", "Angola", "Benin", "Benin", 
"Benin"), Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"
), Year = c(2008L, 2020L, 2013L, 2020L, 2014L), Month = c(1L, 
6L, 1L, 6L, 7L)), class = "data.frame", row.names = c(NA, -5L
))
library(tidyverse)

df <- tibble(
  Country = c("Angola", "Angola", "Benin", "Benin", "Benin"),
  Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"),
  Year = c(2008, 2020, 2013, 2020, 2014),
  Month = c(1,6,1,6,7))


df %>%
  group_by(Country) %>%
  mutate(Date = lubridate::dmy(paste("1", Date))) %>%
  select(-Month, - Year) %>%
  complete(Date = seq(min(Date), max(Date), by = "months"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM