简体   繁体   中英

Repeat data by group per year

I have a data frame with more than thousand rows like this.

org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil

And I would like to make my data like this. I want my data to be repeated after appearing.

org year country
a 2010 US
a 2011 US
a 2012 US
a 2013 US
...
...
a 2021 US
a 2012 UK
a 2013 UK
a 2014 UK
a 2015 UK
...
a 2021 UK
b 2014 Mexico
b 2015 Mexico
b 2016 Mexico
...
b 2021 Mexico
b 2014 CHile
b 2015 CHile
b 2016 CHile
...
b 2021 CHile
b 2015 Brazil
b 2016 Brazil
b 2017 Brazil
...
b 2021 Brazil

And I have tried the following code. It generated the whole year rather than years since the first appearance. Any suggestions would be thankful!

data <- data %>% 
  # expand all years by country
  group_by(org) %>% 
  expand(country, year = full_seq(year, 1)) %>% 
  ungroup() %>% 
  # join with original data to get X values
  left_join(data) %>% 
  # fill the missing country
  fill(country)

How about something like this. Expand the year through map and then unnest :

library(tidyverse)

data <- read_table("org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil")

data |>
  mutate(year = map(year, ~seq(.x, 2021, 1))) |>
  unnest_longer(year)
#> # A tibble: 45 x 3
#>    org    year country
#>    <chr> <dbl> <chr>  
#>  1 a      2010 US     
#>  2 a      2011 US     
#>  3 a      2012 US     
#>  4 a      2013 US     
#>  5 a      2014 US     
#>  6 a      2015 US     
#>  7 a      2016 US     
#>  8 a      2017 US     
#>  9 a      2018 US     
#> 10 a      2019 US     
#> # ... with 35 more rows

A bit counterintuitive, but you can “summarize” to more rows than the input, which is handy in this case:

library(dplyr)

max_yr <- 2021

data <- data %>% 
  group_by(org, country) %>%
  summarize(
    year = min(year):max_yr,
    .groups = "drop"
  )

print(data, n = 20)
# A tibble: 45 × 3
   org   country  year
   <chr> <chr>   <int>
 1 a     UK       2012
 2 a     UK       2013
 3 a     UK       2014
 4 a     UK       2015
 5 a     UK       2016
 6 a     UK       2017
 7 a     UK       2018
 8 a     UK       2019
 9 a     UK       2020
10 a     UK       2021
11 a     US       2010
12 a     US       2011
13 a     US       2012
14 a     US       2013
15 a     US       2014
16 a     US       2015
17 a     US       2016
18 a     US       2017
19 a     US       2018
20 a     US       2019
# … with 25 more rows

Using complete

library(dplyr)
library(tidyr)
df1 %>% 
  group_by(org, country) %>% 
  complete(year = year:2021) %>% 
  ungroup

-output

# A tibble: 45 × 3
   org   country  year
   <chr> <chr>   <dbl>
 1 a     UK       2012
 2 a     UK       2013
 3 a     UK       2014
 4 a     UK       2015
 5 a     UK       2016
 6 a     UK       2017
 7 a     UK       2018
 8 a     UK       2019
 9 a     UK       2020
10 a     UK       2021
# … with 35 more rows

Or with data.table

library(data.table)
setDT(df1)[, .(year = year:2021), .(org, country)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM