[英]R - recursively create dataframe columns via index inside function
I have a dataframe with a large amount of annual data.我有一个 dataframe 有大量的年度数据。 For example consider the following toy example like so:
例如,考虑以下玩具示例:
dat <- data.frame(id = 1:2, quantity = 3:4, agg_2002 = 5:6, agg_2003 = 7:8, agg_2020 = 9:10)
What I would like to do is the following:我想做的是以下几点:
Look for columns named "agg_",in the set of column names, names(df)
在列名集合中查找名为“agg_”的列,
names(df)
Substitute the "agg_" in names(df)
for "change_"将
names(df)
中的“agg_”替换为“change_”
Calculate the relative change from year to year, so for example,计算每年的相对变化,例如,
df$change_2002 <- df$agg_2002/df$agg_2002
(since 2002 is first year) df$change_2002 <- df$agg_2002/df$agg_2002
(因为 2002 年是第一年)
df$change_2003 <- df$agg_2003/df$agg_2002
df$change_2004 <- df$agg_2004/df$agg_2003
...all the way up to 2020 or the latest value with "agg_" in the column name. df$change_2004 <- df$agg_2004/df$agg_2003
...一直到 2020 年或列名中带有“agg_”的最新值。
What I have so far is the following function:到目前为止,我所拥有的是以下 function:
func <- function(dat, overwrite = FALSE) {
nms <- grep("agg_[0-9]+$", names(dat), value = TRUE)
revnms <- gsub("agg_", "chg_", nms)
for i = 1:ncol(df) %in% revnms{
dat[, rvnms][i] <- lapply(dat[, rvnms][i], `/`, dat[, rvnms][i-1])
}
dat
}
What I am struggling with is the indexing.我正在努力的是索引。 How do I get R to make the above calculations recursively without having to do it manually?
如何让 R 递归地进行上述计算而无需手动进行? The desired result is the "chg_" columns appended to the original dataframe:
所需的结果是附加到原始 dataframe 的“chg_”列:
id quantity agg_2002 agg_2003 agg_2020 chg_2002 chg_2003 chg_2020
1 1 3 5 7 9 1 1.40 1.28
2 2 4 6 8 10 1 1.33 1.25
I would like to modify the specified function above to produce the desired result via lapply
if possible.如果可能,我想通过
lapply
修改上面指定的 function 以产生所需的结果。 All ideas are welcome.欢迎所有想法。 Thank you.
谢谢你。
UPDATE: I would much prefer something using lapply
or something that can accomodate differing data types更新:我更喜欢使用
lapply
的东西或可以容纳不同数据类型的东西
Here is a solution with dplyr
and tidyr
:这是
dplyr
和tidyr
的解决方案:
library(tidyr)
library(dplyr)
dat %>%
pivot_longer(cols = starts_with("agg"),
names_to = "year",
names_prefix = "agg_",
values_to = "agg") %>%
group_by(id) %>%
arrange(year) %>%
mutate(change = agg / lag(agg, 1)) %>%
pivot_wider(names_from = year, values_from = c("agg", "change"))
You can make table to long form, change name (can use gsub
), then spread back您可以将表格制作成长表格,更改名称(可以使用
gsub
),然后传播回来
library(tidyverse)
library(stringr)
df <- dat %>% pivot_longer(-c(id,quantity), names_to = "agg", values_to = "year") %>%
mutate(agg = str_replace(agg, "agg", "change")) %>%
group_by(id) %>%
mutate(year = ifelse(is.na(lag(year)), year/year, year/lag(year))) %>% # Divide itself if there is no lag(year)
pivot_wider(names_from = "agg", values_from = "year")
inner_join(dat, df, by = c("id","quantity"))
id quantity agg_2002 agg_2003 agg_2020 change_2002 change_2003 change_2020
1 1 3 5 7 9 1 1.400000 1.285714
2 2 4 6 8 10 1 1.333333 1.250000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.