[英]Creating multiple new columns using mutate() and across() in R
I would like to perform the following calculation on many columns at the same time while they are grouped by ID:我想在按 ID 分组时同时对许多列执行以下计算:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04))
The data set entails a column for Assets.YYYY.MM and Returns.YYYY.MM for each month from 2018.04 to 2022.02 and I would like to create a Flows column for each of those.该数据集包含 Assets.YYYY.MM 和 Returns.YYYY.MM 的列,代表从 2018.04 到 2022.02 的每个月,我想为每个月创建一个 Flows 列。
I know that I could do it like this for every column:我知道我可以对每一列都这样做:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04)) %>%
mutate("Flows.2018.05"= Assets.2018.05 -
(Assets.2018.04 * Returns.2018.05))
But as I want to do this calculation for 50+ columns I was hoping there is a more elegant way.但是因为我想对 50 多列进行计算,所以我希望有一种更优雅的方法。 To my knowledge it should be possible with the dplyr across() function but I was not able to figure out how to do this.
据我所知,使用 dplyr across() function 应该是可行的,但我无法弄清楚如何执行此操作。
I would like the new columns to be named Flows.YYYY.MM which complicates the issue further.我希望将新列命名为 Flows.YYYY.MM,这会使问题进一步复杂化。 I thought that the easiest way to achieve this might be to simply rename the columns after creating them.
我认为实现此目的的最简单方法可能是在创建列后简单地重命名它们。
I have also thought about converting the data frame from wide format to long format to perform this calculation, however this seemed even more complicated to me.我还考虑过将数据帧从宽格式转换为长格式来执行此计算,但这对我来说似乎更加复杂。
Any suggestions on achieving the desired outcome?对实现预期结果有什么建议吗?
Please find below the sample data, as requested:请根据要求在下面找到示例数据:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55 5000 2345 3459 1.03 0.94
2 6F55 3000 1926 6933 0.77 1.11
3 ANE3 5870 8563 1533 1.01 0.89
4 ANE3 4098 9373 4556 0.97 1.02
5 6F55 9878 7432 9855 1.06 1.02
The desired outcome is:期望的结果是:
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55 5000 2345 3459 1.03 0.94 -2805 1255
2 6F55 3000 1926 6933 0.77 1.11 -384 4795
3 ANE3 5870 8563 1533 1.01 0.89 2634 -6088
4 ANE3 4098 9373 4556 0.97 1.02 5398 -5004
5 6F55 9878 7432 9855 1.06 1.02 -3039 2274
How about this:这个怎么样:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df %>%
pivot_longer(-ID,
names_to = c(".value", "date"),
names_pattern= "(.*)\\.(\\d{4}\\.\\d{2})") %>%
arrange(ID, date) %>%
group_by(ID, date) %>%
mutate(obs = seq_along(date)) %>%
group_by(ID, obs) %>%
mutate(Flow = Assets - (lag(Assets)*Returns)) %>%
pivot_wider(names_from = "date",
values_from = c("Assets", "Returns", "Flow")) %>%
as.data.frame()
#> ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55 1 5000 2345 3459 NA
#> 2 6F55 2 3000 1926 6933 NA
#> 3 6F55 3 9878 7432 9855 NA
#> 4 ANE3 1 5870 8563 1533 NA
#> 5 ANE3 2 4098 9373 4556 NA
#> Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1 1.03 0.94 NA -2805.00 1254.70
#> 2 0.77 1.11 NA -384.00 4795.14
#> 3 1.06 1.02 NA -3038.68 2274.36
#> 4 1.01 0.89 NA 2634.30 -6088.07
#> 5 0.97 1.02 NA 5397.94 -5004.46
Created on 2022-04-10 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-04-10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.