繁体   English   中英

每年按组重复数据

[英]Repeat data by group per year

我有一个像这样的超过一千行的数据框。

org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil

我想像这样制作我的数据。 我希望我的数据在出现后被重复。

org year country
a 2010 US
a 2011 US
a 2012 US
a 2013 US
...
...
a 2021 US
a 2012 UK
a 2013 UK
a 2014 UK
a 2015 UK
...
a 2021 UK
b 2014 Mexico
b 2015 Mexico
b 2016 Mexico
...
b 2021 Mexico
b 2014 CHile
b 2015 CHile
b 2016 CHile
...
b 2021 CHile
b 2015 Brazil
b 2016 Brazil
b 2017 Brazil
...
b 2021 Brazil

我已经尝试了以下代码。 自首次出现以来,它产生了一整年而不是几年。 任何建议将不胜感激!

data <- data %>% 
  # expand all years by country
  group_by(org) %>% 
  expand(country, year = full_seq(year, 1)) %>% 
  ungroup() %>% 
  # join with original data to get X values
  left_join(data) %>% 
  # fill the missing country
  fill(country)

这样的事情怎么样。 通过map展开年份,然后取消unnest

library(tidyverse)

data <- read_table("org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil")

data |>
  mutate(year = map(year, ~seq(.x, 2021, 1))) |>
  unnest_longer(year)
#> # A tibble: 45 x 3
#>    org    year country
#>    <chr> <dbl> <chr>  
#>  1 a      2010 US     
#>  2 a      2011 US     
#>  3 a      2012 US     
#>  4 a      2013 US     
#>  5 a      2014 US     
#>  6 a      2015 US     
#>  7 a      2016 US     
#>  8 a      2017 US     
#>  9 a      2018 US     
#> 10 a      2019 US     
#> # ... with 35 more rows

有点违反直觉,但您可以“汇总”到比输入更多的行,这在这种情况下很方便:

library(dplyr)

max_yr <- 2021

data <- data %>% 
  group_by(org, country) %>%
  summarize(
    year = min(year):max_yr,
    .groups = "drop"
  )

print(data, n = 20)
# A tibble: 45 × 3
   org   country  year
   <chr> <chr>   <int>
 1 a     UK       2012
 2 a     UK       2013
 3 a     UK       2014
 4 a     UK       2015
 5 a     UK       2016
 6 a     UK       2017
 7 a     UK       2018
 8 a     UK       2019
 9 a     UK       2020
10 a     UK       2021
11 a     US       2010
12 a     US       2011
13 a     US       2012
14 a     US       2013
15 a     US       2014
16 a     US       2015
17 a     US       2016
18 a     US       2017
19 a     US       2018
20 a     US       2019
# … with 25 more rows

使用complete

library(dplyr)
library(tidyr)
df1 %>% 
  group_by(org, country) %>% 
  complete(year = year:2021) %>% 
  ungroup

-输出

# A tibble: 45 × 3
   org   country  year
   <chr> <chr>   <dbl>
 1 a     UK       2012
 2 a     UK       2013
 3 a     UK       2014
 4 a     UK       2015
 5 a     UK       2016
 6 a     UK       2017
 7 a     UK       2018
 8 a     UK       2019
 9 a     UK       2020
10 a     UK       2021
# … with 35 more rows

data.table

library(data.table)
setDT(df1)[, .(year = year:2021), .(org, country)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM