簡體   English   中英

每年按組重復數據

[英]Repeat data by group per year

我有一個像這樣的超過一千行的數據框。

org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil

我想像這樣制作我的數據。 我希望我的數據在出現后被重復。

org year country
a 2010 US
a 2011 US
a 2012 US
a 2013 US
...
...
a 2021 US
a 2012 UK
a 2013 UK
a 2014 UK
a 2015 UK
...
a 2021 UK
b 2014 Mexico
b 2015 Mexico
b 2016 Mexico
...
b 2021 Mexico
b 2014 CHile
b 2015 CHile
b 2016 CHile
...
b 2021 CHile
b 2015 Brazil
b 2016 Brazil
b 2017 Brazil
...
b 2021 Brazil

我已經嘗試了以下代碼。 自首次出現以來,它產生了一整年而不是幾年。 任何建議將不勝感激!

data <- data %>% 
  # expand all years by country
  group_by(org) %>% 
  expand(country, year = full_seq(year, 1)) %>% 
  ungroup() %>% 
  # join with original data to get X values
  left_join(data) %>% 
  # fill the missing country
  fill(country)

這樣的事情怎么樣。 通過map展開年份,然后取消unnest

library(tidyverse)

data <- read_table("org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil")

data |>
  mutate(year = map(year, ~seq(.x, 2021, 1))) |>
  unnest_longer(year)
#> # A tibble: 45 x 3
#>    org    year country
#>    <chr> <dbl> <chr>  
#>  1 a      2010 US     
#>  2 a      2011 US     
#>  3 a      2012 US     
#>  4 a      2013 US     
#>  5 a      2014 US     
#>  6 a      2015 US     
#>  7 a      2016 US     
#>  8 a      2017 US     
#>  9 a      2018 US     
#> 10 a      2019 US     
#> # ... with 35 more rows

有點違反直覺,但您可以“匯總”到比輸入更多的行,這在這種情況下很方便:

library(dplyr)

max_yr <- 2021

data <- data %>% 
  group_by(org, country) %>%
  summarize(
    year = min(year):max_yr,
    .groups = "drop"
  )

print(data, n = 20)
# A tibble: 45 × 3
   org   country  year
   <chr> <chr>   <int>
 1 a     UK       2012
 2 a     UK       2013
 3 a     UK       2014
 4 a     UK       2015
 5 a     UK       2016
 6 a     UK       2017
 7 a     UK       2018
 8 a     UK       2019
 9 a     UK       2020
10 a     UK       2021
11 a     US       2010
12 a     US       2011
13 a     US       2012
14 a     US       2013
15 a     US       2014
16 a     US       2015
17 a     US       2016
18 a     US       2017
19 a     US       2018
20 a     US       2019
# … with 25 more rows

使用complete

library(dplyr)
library(tidyr)
df1 %>% 
  group_by(org, country) %>% 
  complete(year = year:2021) %>% 
  ungroup

-輸出

# A tibble: 45 × 3
   org   country  year
   <chr> <chr>   <dbl>
 1 a     UK       2012
 2 a     UK       2013
 3 a     UK       2014
 4 a     UK       2015
 5 a     UK       2016
 6 a     UK       2017
 7 a     UK       2018
 8 a     UK       2019
 9 a     UK       2020
10 a     UK       2021
# … with 35 more rows

data.table

library(data.table)
setDT(df1)[, .(year = year:2021), .(org, country)]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM