如何使用动态名称计算 R dataframe 中的多个新列

Question

我正在尝试在 R dataframe 中生成多个新列/变量，其中动态新名称取自向量。 新变量是根据单个列的组/级别计算的。

dataframe 包含不同化学元素（元素）沿深度（ z ）的测量值（计数）。 通过将某个深度的每个元素的计数除以相同深度的代理元素（代理）的相应计数来计算新变量。

如果我只想创建一个新列/明确命名列（请参见下面的代码），那么已经有一个使用 mutate 的解决方案。 我正在寻找在 shiny web 应用程序中使用的通用解决方案，其中代理不是字符串而是字符串向量，并且根据用户输入动态变化。

# Working code for just one new column at a time (here Ti_ratio)

proxies <- "Ti"
df <- tibble(z = rep(1:10, 4), element = rep(c("Ag", "Fe", "Ca", "Ti"), each = 10), counts = rnorm(40))

df_Ti <- df %>%
  group_by(z) %>%
  mutate(Ti_ratio = counts/counts[element %in% proxies])

# Not working code for multiple columns at a time

proxies <- c("Ca", "Fe", "Ti")
varname <- paste(proxies, "ratio", sep = "_")

df_ratios <- df %>%
  group_by(z) %>%
  map(~ mutate(!!varname = .x$counts/.x$counts[element %in% proxies]))

Output的工作代码：

> head(df_Ti)
# A tibble: 6 x 4
# Groups:   z [6]
      z element counts Ti_ratio
  <int> <chr>    <dbl>    <dbl>
1     1 Ag       2.41     4.10 
2     2 Ag      -1.06    -0.970
3     3 Ag      -0.312   -0.458
4     4 Ag      -0.186    0.570
5     5 Ag       1.12    -1.38 
6     6 Ag      -1.68    -2.84

预期 output 的不工作代码：

> head(df_ratios)
# A tibble: 6 x 6
# Groups:   z [6]
      z element counts Ca_ratio Fe_ratio Ti_ratio
  <int> <chr>    <dbl>    <dbl>    <dbl>    <dbl>
1     1 Ag       2.41     4.78   -10.1      4.10 
2     2 Ag      -1.06     3.19     0.506   -0.970
3     3 Ag      -0.312   -0.479   -0.621   -0.458
4     4 Ag      -0.186   -0.296   -0.145    0.570
5     5 Ag       1.12     0.353    3.19    -1.38 
6     6 Ag      -1.68    -2.81    -0.927   -2.84

编辑：我使用两个嵌套的 for 循环找到了base R问题的通用解决方案，类似于@fra 发布的答案（不同之处在于我在深度和代理上都循环）：

library(tidyverse)
df <- tibble(z = rep(1:3, 4), element = rep(c("Ag", "Ca", "Fe", "Ti"), each = 3), counts = runif(12)) %>% arrange(z, element)
proxies <- c("Ca", "Fe", "Ti")

for (f in seq_along(proxies)) {
  proxy <- proxies[f]
  tmp2 <- NULL
  for (i in unique(df$z)) {
    tmp <- df[df$z == i,]
    tmp <- as.data.frame(tmp$counts/tmp$counts[tmp$element %in% proxy])
    names(tmp) <- paste(proxy, "ratio", sep = "_")
    tmp2 <- rbind(tmp2, tmp)
  }
  df[, 3 + f] <- tmp2
}

以及正确的 output：

> head(df)
# A tibble: 6 x 6
      z element counts Ca_ratio Fe_ratio Ti_ratio
  <int> <chr>    <dbl>    <dbl>    <dbl>    <dbl>
1     1 Ag      0.690    0.864      9.21    1.13 
2     1 Ca      0.798    1         10.7     1.30 
3     1 Fe      0.0749   0.0938     1       0.122
4     1 Ti      0.612    0.767      8.17    1    
5     2 Ag      0.687    0.807      3.76    0.730
6     2 Ca      0.851    1          4.66    0.904

我使 dataframe 包含更少的数据，因此可以清楚地看到为什么该解决方案是正确的（元素与自身的比率 = 1）。 我仍然对可以与管道一起使用的更优雅的解决方案感兴趣。

Answer 1

使用base R

proxies <- c("Ca", "Fe", "Ti")

for(f in proxies){
   newDF <- as.data.frame(df$counts/df$counts[df$element %in% f])
   names(newDF) <- paste(f, "ratio", sep = "_")
   df <- cbind(df,newDF)
}

> df
    z element      counts    Ca_ratio    Fe_ratio    Ti_ratio
1   1      Ag -0.40163072 -0.35820754   1.7375395  0.45692965
2   2      Ag -1.00880171  1.27798430  22.8520332 -2.84599471
3   3      Ag  0.72230855 -1.19506223   6.3893485 -0.73558507
4   4      Ag -1.71524002 -1.38942436   1.7564861 -3.03313134
5   5      Ag -0.30813737  1.08127226   4.1985801 -0.33008370
6   6      Ag  0.20524663  0.08910397  -0.3132916 -0.23778331
...

Answer 2

一个tidyverse选项可能是创建一个 function，类似于您的原始代码，然后使用map_dfc创建新列。

library(tidyverse)

proxies <- c("Ca", "Fe", "Ti")

your_func <- function(x){

    df %>% 
       group_by(z) %>%
       mutate(!!paste(x, "ratio", sep = "_") := counts/counts[element %in% !!x]) %>% 
       ungroup() %>%
       select(!!paste(x, "ratio", sep = "_") )
}

df %>% 
   group_modify(~map_dfc(proxies, your_func)) %>% 
   bind_cols(df, .) %>%
   arrange(z, element)


#       z element  counts Ca_ratio Fe_ratio Ti_ratio
#   <int> <chr>     <dbl>    <dbl>    <dbl>    <dbl>
# 1     1 Ag      -0.112   -0.733    -0.197   -1.51 
# 2     1 Ca       0.153    1         0.269    2.06 
# 3     1 Fe       0.570    3.72      1        7.66 
# 4     1 Ti       0.0743   0.485     0.130    1    
# 5     2 Ag       0.881    0.406    -6.52    -1.49 
# 6     2 Ca       2.17     1       -16.1     -3.69 
# 7     2 Fe      -0.135   -0.0622    1        0.229
# 8     2 Ti      -0.590   -0.271     4.37     1    
# 9     3 Ag       0.398    0.837     0.166   -0.700
#10     3 Ca       0.476    1         0.198   -0.836
# ... with 30 more rows

如何使用动态名称计算 R dataframe 中的多个新列

问题描述

2 个解决方案

解决方案1
1 2019-10-31 10:53:18

解决方案2
1 已采纳 2019-11-01 18:37:24

如何使用动态名称计算 R dataframe 中的多个新列

问题描述

2 个解决方案

解决方案1 1 2019-10-31 10:53:18

解决方案2 1 已采纳 2019-11-01 18:37:24

解决方案1
1 2019-10-31 10:53:18

解决方案2
1 已采纳 2019-11-01 18:37:24