如何使用動態名稱計算 R dataframe 中的多個新列

Question

我正在嘗試在 R dataframe 中生成多個新列/變量，其中動態新名稱取自向量。 新變量是根據單個列的組/級別計算的。

dataframe 包含不同化學元素（元素）沿深度（ z ）的測量值（計數）。 通過將某個深度的每個元素的計數除以相同深度的代理元素（代理）的相應計數來計算新變量。

如果我只想創建一個新列/明確命名列（請參見下面的代碼），那么已經有一個使用 mutate 的解決方案。 我正在尋找在 shiny web 應用程序中使用的通用解決方案，其中代理不是字符串而是字符串向量，並且根據用戶輸入動態變化。

# Working code for just one new column at a time (here Ti_ratio)

proxies <- "Ti"
df <- tibble(z = rep(1:10, 4), element = rep(c("Ag", "Fe", "Ca", "Ti"), each = 10), counts = rnorm(40))

df_Ti <- df %>%
  group_by(z) %>%
  mutate(Ti_ratio = counts/counts[element %in% proxies])

# Not working code for multiple columns at a time

proxies <- c("Ca", "Fe", "Ti")
varname <- paste(proxies, "ratio", sep = "_")

df_ratios <- df %>%
  group_by(z) %>%
  map(~ mutate(!!varname = .x$counts/.x$counts[element %in% proxies]))

Output的工作代碼：

> head(df_Ti)
# A tibble: 6 x 4
# Groups:   z [6]
      z element counts Ti_ratio
  <int> <chr>    <dbl>    <dbl>
1     1 Ag       2.41     4.10 
2     2 Ag      -1.06    -0.970
3     3 Ag      -0.312   -0.458
4     4 Ag      -0.186    0.570
5     5 Ag       1.12    -1.38 
6     6 Ag      -1.68    -2.84

預期 output 的不工作代碼：

> head(df_ratios)
# A tibble: 6 x 6
# Groups:   z [6]
      z element counts Ca_ratio Fe_ratio Ti_ratio
  <int> <chr>    <dbl>    <dbl>    <dbl>    <dbl>
1     1 Ag       2.41     4.78   -10.1      4.10 
2     2 Ag      -1.06     3.19     0.506   -0.970
3     3 Ag      -0.312   -0.479   -0.621   -0.458
4     4 Ag      -0.186   -0.296   -0.145    0.570
5     5 Ag       1.12     0.353    3.19    -1.38 
6     6 Ag      -1.68    -2.81    -0.927   -2.84

編輯：我使用兩個嵌套的 for 循環找到了base R問題的通用解決方案，類似於@fra 發布的答案（不同之處在於我在深度和代理上都循環）：

library(tidyverse)
df <- tibble(z = rep(1:3, 4), element = rep(c("Ag", "Ca", "Fe", "Ti"), each = 3), counts = runif(12)) %>% arrange(z, element)
proxies <- c("Ca", "Fe", "Ti")

for (f in seq_along(proxies)) {
  proxy <- proxies[f]
  tmp2 <- NULL
  for (i in unique(df$z)) {
    tmp <- df[df$z == i,]
    tmp <- as.data.frame(tmp$counts/tmp$counts[tmp$element %in% proxy])
    names(tmp) <- paste(proxy, "ratio", sep = "_")
    tmp2 <- rbind(tmp2, tmp)
  }
  df[, 3 + f] <- tmp2
}

以及正確的 output：

> head(df)
# A tibble: 6 x 6
      z element counts Ca_ratio Fe_ratio Ti_ratio
  <int> <chr>    <dbl>    <dbl>    <dbl>    <dbl>
1     1 Ag      0.690    0.864      9.21    1.13 
2     1 Ca      0.798    1         10.7     1.30 
3     1 Fe      0.0749   0.0938     1       0.122
4     1 Ti      0.612    0.767      8.17    1    
5     2 Ag      0.687    0.807      3.76    0.730
6     2 Ca      0.851    1          4.66    0.904

我使 dataframe 包含更少的數據，因此可以清楚地看到為什么該解決方案是正確的（元素與自身的比率 = 1）。 我仍然對可以與管道一起使用的更優雅的解決方案感興趣。

Answer 1

使用base R

proxies <- c("Ca", "Fe", "Ti")

for(f in proxies){
   newDF <- as.data.frame(df$counts/df$counts[df$element %in% f])
   names(newDF) <- paste(f, "ratio", sep = "_")
   df <- cbind(df,newDF)
}

> df
    z element      counts    Ca_ratio    Fe_ratio    Ti_ratio
1   1      Ag -0.40163072 -0.35820754   1.7375395  0.45692965
2   2      Ag -1.00880171  1.27798430  22.8520332 -2.84599471
3   3      Ag  0.72230855 -1.19506223   6.3893485 -0.73558507
4   4      Ag -1.71524002 -1.38942436   1.7564861 -3.03313134
5   5      Ag -0.30813737  1.08127226   4.1985801 -0.33008370
6   6      Ag  0.20524663  0.08910397  -0.3132916 -0.23778331
...

Answer 2

一個tidyverse選項可能是創建一個 function，類似於您的原始代碼，然后使用map_dfc創建新列。

library(tidyverse)

proxies <- c("Ca", "Fe", "Ti")

your_func <- function(x){

    df %>% 
       group_by(z) %>%
       mutate(!!paste(x, "ratio", sep = "_") := counts/counts[element %in% !!x]) %>% 
       ungroup() %>%
       select(!!paste(x, "ratio", sep = "_") )
}

df %>% 
   group_modify(~map_dfc(proxies, your_func)) %>% 
   bind_cols(df, .) %>%
   arrange(z, element)


#       z element  counts Ca_ratio Fe_ratio Ti_ratio
#   <int> <chr>     <dbl>    <dbl>    <dbl>    <dbl>
# 1     1 Ag      -0.112   -0.733    -0.197   -1.51 
# 2     1 Ca       0.153    1         0.269    2.06 
# 3     1 Fe       0.570    3.72      1        7.66 
# 4     1 Ti       0.0743   0.485     0.130    1    
# 5     2 Ag       0.881    0.406    -6.52    -1.49 
# 6     2 Ca       2.17     1       -16.1     -3.69 
# 7     2 Fe      -0.135   -0.0622    1        0.229
# 8     2 Ti      -0.590   -0.271     4.37     1    
# 9     3 Ag       0.398    0.837     0.166   -0.700
#10     3 Ca       0.476    1         0.198   -0.836
# ... with 30 more rows

如何使用動態名稱計算 R dataframe 中的多個新列

問題描述

2 個解決方案

解決方案1
1 2019-10-31 10:53:18

解決方案2
1 已采納 2019-11-01 18:37:24

如何使用動態名稱計算 R dataframe 中的多個新列

問題描述

2 個解決方案

解決方案1 1 2019-10-31 10:53:18

解決方案2 1 已采納 2019-11-01 18:37:24

解決方案1
1 2019-10-31 10:53:18

解決方案2
1 已采納 2019-11-01 18:37:24