繁体   English   中英

R dataframe 使用cross / all_of / mutate_if从现有列创建多个新列

[英]R dataframe create mulitple new columns from existing columns using across / all_of / mutate_if

我有一个 dataframe(下面的示例),它在多天内对问卷进行了回复。

        > df %>% 
            mutate (Sigma_Bucket_Q1  = if_else(Sigma_Q1 >= Median_Sigma_Q1, 
                    "Above Median Volatility", "Below Median Volatility"))
    # A tibble: 19 x 12
       UserId Days_From_First_Use    Q1    Q2    Q3 Sigma_Q1 Sigma_Q2 Sigma_Q3 Median_Sigma_Q1 Median_Sigma_Q2 Median_Sigma_Q3 Sigma_Bucket_Q1        
       <fct>                <int> <int> <int> <int>    <dbl>    <dbl>    <dbl>           <dbl>           <dbl>           <dbl> <chr>                  
     1 A                        0     3     2     1     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     2 A                        1     1     0     0     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     3 A                        2     1     1     0     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     4 A                        3     0     2     0     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     5 A                        4     1     1     1     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     6 B                        0     4     8     2     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
     7 B                        2     2     2     1     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
     8 B                        4     5     6     5     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
     9 B                        5     4     5     5     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
    10 C                        0     5     7     2     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    11 C                        1     2     2     2     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    12 C                        2     5     5     4     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    13 C                        3     6     5     3     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    14 C                        4     6     6     4     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    15 D                        0     5     3     5     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    16 D                        1     5     3     4     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    17 D                        2     4     2     6     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    18 D                        3     0     0     1     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    19 D                        4     1     1     1     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility

Q1Q2Q3具有响应,而Sigma_Q1Sigma_Q2Sigma Q3具有每个受试者对每个问题的响应的时间序列标准差。 Median_Sigma_1Median_Sigma_2Median_Sigma_3具有受试者对Q1Q2Q3的反应的中位标准差。 我想根据是否Sigma_Q1 > Median_Sigma_Q1等将每个主题分类为高于中值或低于中值波动率主题。 我生成Sigma_Bucket_Q1的表达式工作得很好; 它在小标题之前可见。

但是当我尝试将它概括为同时生成所有 Sigma_Buckets 时(我的真正问题有 21 个这样的名称),我遇到了一个问题。 我试过了:

        df %>% 
  mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) = if_else(paste0("Sigma_", {.col}) >= paste0("Median_Sigma_",  {.col}), 
          "Above Median Volatility", "Below Median Volatility")))

我收到一条神秘的错误消息,无法确定我需要修复什么:

> df %>% 
+   mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) = if_else(paste0("Sigma_", {.col}) >= paste0("Median_Sigma_",  {.col}), 
Error: unexpected '=' in:
"df %>% 
  mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) ="
>           "Above Median Volatility", "Below Median Volatility")))
Error: unexpected ',' in "          "Above Median Volatility","

如何修改我的陈述以完成所有 3 列(实际问题中的所有 21 列)而不为每个问题写一行?

浏览 StackOverflow 上的各种答案表明mutate_if可能是解决方案的基础,但我不知道如何在这个特定设置中使用它。

非常感谢您的帮助

托马斯飞利浦

across无法访问列名,它们仅传递列值。 您可以尝试这种没有任何循环的矢量化基础 R 方法。

col1 <- grep('^Sigma_Q\\d$', names(df), value = TRUE)
col2 <- grep('^Median_Sigma_Q\\d$', names(df), value = TRUE)

df[paste0(col1, '_Bucket')] <- c("Below Median Volatility", "Above Median Volatility")[(df[col1] >= df[col2]) + 1]

这是使用map的解决方案:

map2_df(
    df %>% select(starts_with("Sigma_Q")), 
    df %>% select(starts_with("Median_Sigma_Q")),
    ~if_else(.x >= .y, "Above Median Volatility", "Below Median Volatility")) %>%
  rename_with(~str_replace(.x, "Sigma", "Sigma_Bucket"))

Output:

# A tibble: 19 x 3
   Sigma_Bucket_Q1         Sigma_Bucket_Q2         Sigma_Bucket_Q3        
   <chr>                   <chr>                   <chr>                  
 1 Below Median Volatility Below Median Volatility Below Median Volatility
 2 Below Median Volatility Below Median Volatility Below Median Volatility
 3 Below Median Volatility Below Median Volatility Below Median Volatility
 4 Below Median Volatility Below Median Volatility Below Median Volatility
 5 Below Median Volatility Below Median Volatility Below Median Volatility
 6 Below Median Volatility Above Median Volatility Above Median Volatility
 7 Below Median Volatility Above Median Volatility Above Median Volatility
 8 Below Median Volatility Above Median Volatility Above Median Volatility
 9 Below Median Volatility Above Median Volatility Above Median Volatility
10 Above Median Volatility Above Median Volatility Below Median Volatility
11 Above Median Volatility Above Median Volatility Below Median Volatility
12 Above Median Volatility Above Median Volatility Below Median Volatility
13 Above Median Volatility Above Median Volatility Below Median Volatility
14 Above Median Volatility Above Median Volatility Below Median Volatility
15 Above Median Volatility Below Median Volatility Above Median Volatility
16 Above Median Volatility Below Median Volatility Above Median Volatility
17 Above Median Volatility Below Median Volatility Above Median Volatility
18 Above Median Volatility Below Median Volatility Above Median Volatility
19 Above Median Volatility Below Median Volatility Above Median Volatility

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM