在 R 中按名称模式计算多个新列

Question

I have data with population numbers, births and deaths by year and country, disaggregated by gender and age.我有按年份和国家/地区分类的人口数量、出生和死亡数据，按性别和年龄分类。 I would like to compute the net migration rate for each year-country-gender-age combo.我想计算每个年份-国家-性别-年龄组合的净迁移率。 Here is what the data looks like:以下是数据的样子：

The formula to compute the net migration rate (following the naming convention of the data) would be: 2001_netmigration = 2001_pop - 2000_deaths + 2000_births - 2000_pop .计算净迁移率的公式（遵循数据的命名约定）为： 2001_netmigration = 2001_pop - 2000_deaths + 2000_births - 2000_pop 。 I want to perform this for all years from 2001 to 2020., ie over all columns.我想从 2001 年到 2020 年的所有年份执行此操作，即在所有列上执行此操作。

I tried the following code:我尝试了以下代码：

n <- 2001

while(n <= 2020){
  aux  <- aux %>% 
    mutate(., paste0(n,"_netmigr") = paste0(n,"_pop") - paste0((n-1),"_deaths") + 
             paste0((n-1),"_births") - paste0((n-1),"_pop"), .after = paste0(n,"_pop"))
}

When I manually run the code inside the while loop using actual names instead of the paste0 commands, it works exactly as I want it to.当我使用实际名称而不是 paste0 命令在 while 循环内手动运行代码时，它完全按照我的意愿运行。 Is there a way to iteratively specify/identify names that I am not seeing?有没有办法迭代地指定/识别我没有看到的名称？

Thankful for any insights!感谢您的任何见解！

Answer 1

Here's some sample data:以下是一些示例数据：

library(tidyr)

tb <- expand_grid(country = letters[1:5], sex = c("male", "female"))
for (yr in 2000:2020) tb[[paste0(yr, "_pop")]] <- sample(1e6, nrow(tb))
for (yr in 2000:2020) tb[[paste0(yr, "_births")]] <- sample(1e6, nrow(tb))
for (yr in 2000:2020) tb[[paste0(yr, "_deaths")]] <- sample(1e6, nrow(tb))

tb
# A tibble: 10 × 65
   country sex    `2000_pop` `2001_pop` `2002_pop` `2003_pop` `2004_pop`
   <chr>   <chr>       <int>      <int>      <int>      <int>      <int>
 1 a       male       494854     125496     441605     850152     564524
 2 a       female      15675     700400     884402     722577     488377
 3 b       male       863598     430942     178898     962331     762543
 ...

Let's reshape:让我们重塑：

tb <- tb |> 
        pivot_longer(starts_with("20"), names_to = c("year", "var"), 
                       names_sep = "_") |> 
        pivot_wider(names_from = "var")
tb
# A tibble: 210 × 6
   country sex   year     pop births deaths
   <chr>   <chr> <chr>  <int>  <int>  <int>
 1 a       male  2000  494854 692068 890029
 2 a       male  2001  125496 420085 334800
 3 a       male  2002  441605 341633 816369
 4 a       male  2003  850152 310789 766912
 ...

Now your data is tidy, and no for loop or column name munging is required:现在您的数据很整洁，不需要 for 循环或列名修改：

tb$net_migr <- tb$pop - tb$deaths + tb$births
# or
tb <- tb |> mutate(net_migr = pop - deaths + births)

If you want to, you can now return tb to wide format.如果您愿意，您现在可以将tb恢复为宽格式。 (But why would you want to?) （但你为什么想要？）

在 R 中按名称模式计算多个新列

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-06-03 14:02:08

在 R 中按名称模式计算多个新列

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-06-03 14:02:08

解决方案1
0 已采纳 2022-06-03 14:02:08