简体   繁体   English

在 R 中使用 mutate() 和 across() 创建多个新列

[英]Creating multiple new columns using mutate() and across() in R

I would like to perform the following calculation on many columns at the same time while they are grouped by ID:我想在按 ID 分组时同时对许多列执行以下计算:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04))

The data set entails a column for Assets.YYYY.MM and Returns.YYYY.MM for each month from 2018.04 to 2022.02 and I would like to create a Flows column for each of those.该数据集包含 Assets.YYYY.MM 和 Returns.YYYY.MM 的列,代表从 2018.04 到 2022.02 的每个月,我想为每个月创建一个 Flows 列。

I know that I could do it like this for every column:我知道我可以对每一列都这样做:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04)) %>%
  mutate("Flows.2018.05"= Assets.2018.05 - 
           (Assets.2018.04 * Returns.2018.05))

But as I want to do this calculation for 50+ columns I was hoping there is a more elegant way.但是因为我想对 50 多列进行计算,所以我希望有一种更优雅的方法。 To my knowledge it should be possible with the dplyr across() function but I was not able to figure out how to do this.据我所知,使用 dplyr across() function 应该是可行的,但我无法弄清楚如何执行此操作。

I would like the new columns to be named Flows.YYYY.MM which complicates the issue further.我希望将新列命名为 Flows.YYYY.MM,这会使问题进一步复杂化。 I thought that the easiest way to achieve this might be to simply rename the columns after creating them.我认为实现此目的的最简单方法可能是在创建列后简单地重命名它们。

I have also thought about converting the data frame from wide format to long format to perform this calculation, however this seemed even more complicated to me.我还考虑过将数据帧从宽格式转换为长格式来执行此计算,但这对我来说似乎更加复杂。

Any suggestions on achieving the desired outcome?对实现预期结果有什么建议吗?

Please find below the sample data, as requested:请根据要求在下面找到示例数据:

library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))

df
    ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55           5000           2345           3459            1.03            0.94
2 6F55           3000           1926           6933            0.77            1.11
3 ANE3           5870           8563           1533            1.01            0.89
4 ANE3           4098           9373           4556            0.97            1.02
5 6F55           9878           7432           9855            1.06            1.02

The desired outcome is:期望的结果是:

  ID    Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55            5000           2345           3459            1.03            0.94        -2805          1255
2 6F55            3000           1926           6933            0.77            1.11         -384          4795
3 ANE3            5870           8563           1533            1.01            0.89         2634         -6088
4 ANE3            4098           9373           4556            0.97            1.02         5398         -5004
5 6F55            9878           7432           9855            1.06            1.02        -3039          2274

How about this:这个怎么样:

  library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))


df %>% 
  pivot_longer(-ID, 
               names_to = c(".value", "date"), 
               names_pattern= "(.*)\\.(\\d{4}\\.\\d{2})") %>% 
  arrange(ID, date) %>% 
  group_by(ID, date) %>% 
  mutate(obs = seq_along(date)) %>% 
  group_by(ID, obs) %>% 
  mutate(Flow = Assets - (lag(Assets)*Returns)) %>% 
  pivot_wider(names_from = "date", 
              values_from = c("Assets", "Returns", "Flow")) %>% 
  as.data.frame()
#>     ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55   1           5000           2345           3459              NA
#> 2 6F55   2           3000           1926           6933              NA
#> 3 6F55   3           9878           7432           9855              NA
#> 4 ANE3   1           5870           8563           1533              NA
#> 5 ANE3   2           4098           9373           4556              NA
#>   Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1            1.03            0.94           NA     -2805.00      1254.70
#> 2            0.77            1.11           NA      -384.00      4795.14
#> 3            1.06            1.02           NA     -3038.68      2274.36
#> 4            1.01            0.89           NA      2634.30     -6088.07
#> 5            0.97            1.02           NA      5397.94     -5004.46

Created on 2022-04-10 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-04-10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM