[英]Repeat data by group per year
我有一个像这样的超过一千行的数据框。
org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil
我想像这样制作我的数据。 我希望我的数据在出现后被重复。
org year country
a 2010 US
a 2011 US
a 2012 US
a 2013 US
...
...
a 2021 US
a 2012 UK
a 2013 UK
a 2014 UK
a 2015 UK
...
a 2021 UK
b 2014 Mexico
b 2015 Mexico
b 2016 Mexico
...
b 2021 Mexico
b 2014 CHile
b 2015 CHile
b 2016 CHile
...
b 2021 CHile
b 2015 Brazil
b 2016 Brazil
b 2017 Brazil
...
b 2021 Brazil
我已经尝试了以下代码。 自首次出现以来,它产生了一整年而不是几年。 任何建议将不胜感激!
data <- data %>%
# expand all years by country
group_by(org) %>%
expand(country, year = full_seq(year, 1)) %>%
ungroup() %>%
# join with original data to get X values
left_join(data) %>%
# fill the missing country
fill(country)
这样的事情怎么样。 通过map
展开年份,然后取消unnest
:
library(tidyverse)
data <- read_table("org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil")
data |>
mutate(year = map(year, ~seq(.x, 2021, 1))) |>
unnest_longer(year)
#> # A tibble: 45 x 3
#> org year country
#> <chr> <dbl> <chr>
#> 1 a 2010 US
#> 2 a 2011 US
#> 3 a 2012 US
#> 4 a 2013 US
#> 5 a 2014 US
#> 6 a 2015 US
#> 7 a 2016 US
#> 8 a 2017 US
#> 9 a 2018 US
#> 10 a 2019 US
#> # ... with 35 more rows
有点违反直觉,但您可以“汇总”到比输入更多的行,这在这种情况下很方便:
library(dplyr)
max_yr <- 2021
data <- data %>%
group_by(org, country) %>%
summarize(
year = min(year):max_yr,
.groups = "drop"
)
print(data, n = 20)
# A tibble: 45 × 3
org country year
<chr> <chr> <int>
1 a UK 2012
2 a UK 2013
3 a UK 2014
4 a UK 2015
5 a UK 2016
6 a UK 2017
7 a UK 2018
8 a UK 2019
9 a UK 2020
10 a UK 2021
11 a US 2010
12 a US 2011
13 a US 2012
14 a US 2013
15 a US 2014
16 a US 2015
17 a US 2016
18 a US 2017
19 a US 2018
20 a US 2019
# … with 25 more rows
使用complete
library(dplyr)
library(tidyr)
df1 %>%
group_by(org, country) %>%
complete(year = year:2021) %>%
ungroup
-输出
# A tibble: 45 × 3
org country year
<chr> <chr> <dbl>
1 a UK 2012
2 a UK 2013
3 a UK 2014
4 a UK 2015
5 a UK 2016
6 a UK 2017
7 a UK 2018
8 a UK 2019
9 a UK 2020
10 a UK 2021
# … with 35 more rows
或data.table
library(data.table)
setDT(df1)[, .(year = year:2021), .(org, country)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.