R - 通过 function 中的索引递归地创建 dataframe 列

Question

I have a dataframe with a large amount of annual data.我有一个 dataframe 有大量的年度数据。 For example consider the following toy example like so:例如，考虑以下玩具示例：

dat <- data.frame(id = 1:2, quantity = 3:4, agg_2002 = 5:6, agg_2003 = 7:8, agg_2020 = 9:10)

What I would like to do is the following:我想做的是以下几点：

Look for columns named "agg_",in the set of column names, names(df)在列名集合中查找名为“agg_”的列， names(df)
Substitute the "agg_" in names(df) for "change_"将names(df)中的“agg_”替换为“change_”
Calculate the relative change from year to year, so for example,计算每年的相对变化，例如，
df$change_2002 <- df$agg_2002/df$agg_2002 (since 2002 is first year) df$change_2002 <- df$agg_2002/df$agg_2002 （因为 2002 年是第一年）
df$change_2003 <- df$agg_2003/df$agg_2002
df$change_2004 <- df$agg_2004/df$agg_2003 ...all the way up to 2020 or the latest value with "agg_" in the column name. df$change_2004 <- df$agg_2004/df$agg_2003 ...一直到 2020 年或列名中带有“agg_”的最新值。

What I have so far is the following function:到目前为止，我所拥有的是以下 function：

func <- function(dat, overwrite = FALSE) {
  nms <- grep("agg_[0-9]+$", names(dat), value = TRUE)
  revnms <- gsub("agg_", "chg_", nms)
  for i = 1:ncol(df) %in% revnms{
    dat[, rvnms][i] <- lapply(dat[, rvnms][i], `/`, dat[, rvnms][i-1])
  }
  dat
}

What I am struggling with is the indexing.我正在努力的是索引。 How do I get R to make the above calculations recursively without having to do it manually?如何让 R 递归地进行上述计算而无需手动进行？ The desired result is the "chg_" columns appended to the original dataframe:所需的结果是附加到原始 dataframe 的“chg_”列：

  id quantity agg_2002 agg_2003 agg_2020 chg_2002 chg_2003 chg_2020
1  1        3        5        7        9        1     1.40     1.28
2  2        4        6        8       10        1     1.33     1.25

I would like to modify the specified function above to produce the desired result via lapply if possible.如果可能，我想通过lapply修改上面指定的 function 以产生所需的结果。 All ideas are welcome.欢迎所有想法。 Thank you.谢谢你。

UPDATE: I would much prefer something using lapply or something that can accomodate differing data types更新：我更喜欢使用lapply的东西或可以容纳不同数据类型的东西

Answer 1

Here is a solution with dplyr and tidyr :这是dplyr和tidyr的解决方案：

library(tidyr)
library(dplyr)

dat %>%
  pivot_longer(cols = starts_with("agg"),
                      names_to = "year",
                      names_prefix = "agg_",
                      values_to = "agg") %>%
  group_by(id) %>%
  arrange(year) %>%
  mutate(change = agg / lag(agg, 1)) %>%
  pivot_wider(names_from = year, values_from = c("agg", "change"))

Answer 2

You can make table to long form, change name (can use gsub ), then spread back您可以将表格制作成长表格，更改名称（可以使用gsub ），然后传播回来

library(tidyverse)
library(stringr)

df <- dat %>% pivot_longer(-c(id,quantity), names_to = "agg", values_to = "year") %>% 
  mutate(agg = str_replace(agg, "agg", "change")) %>% 
  group_by(id) %>% 
  mutate(year = ifelse(is.na(lag(year)), year/year, year/lag(year))) %>% # Divide itself if there is no lag(year)
  pivot_wider(names_from = "agg", values_from = "year") 

inner_join(dat, df, by = c("id","quantity"))

  id quantity agg_2002 agg_2003 agg_2020 change_2002 change_2003 change_2020
1  1        3        5        7        9           1    1.400000    1.285714
2  2        4        6        8       10           1    1.333333    1.250000

R - 通过 function 中的索引递归地创建 dataframe 列

问题描述

2 个解决方案

解决方案1
1 2021-06-03 18:49:19

解决方案2
1 已采纳 2021-06-03 18:49:36

R - 通过 function 中的索引递归地创建 dataframe 列

问题描述

2 个解决方案

解决方案1 1 2021-06-03 18:49:19

解决方案2 1 已采纳 2021-06-03 18:49:36

解决方案1
1 2021-06-03 18:49:19

解决方案2
1 已采纳 2021-06-03 18:49:36