简体   繁体   English

组合:rowwise()、mutate()、cross(),用于多个函数

[英]Combine: rowwise(), mutate(), across(), for multiple functions

This is somehow related to this question : In principle I try to understand how rowwise operations with mutate across multiple columns applying more then 1 functions like ( mean() , sum() , min() etc..) work.这在某种程度上与这个问题有关:原则上,我试图了解如何在多个列中应用超过 1 个函数(如mean()sum()min()等)进行mutaterowwise操作。

I have learned that across does this job and not c_across .我了解到, c_across across I have learned that the function mean() is different to the function min() in that way that mean() doesn't work on dataframes and we need to change it to vector which can be done with unlist or as.matrix -> learned from Ronak Shah here Understanding rowwise() and c_across()我了解到 function mean()与 function min()不同,因为mean()不适用于数据帧,我们需要将其更改为可以使用 unlist 或 as.matrix 完成的向量 ->从 Ronak Shah 那里学到了理解 rowwise() 和 c_across()

Now with my actual case: I was able to do this task but I loose one column d .现在以我的实际情况为例:我能够完成这项任务,但我丢失了一列d How can I avoid the loose of the column d in this setting.在此设置中如何避免d列松动。

My df:我的df:

df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b", 
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

Works not:不工作:

df %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         avg = mean(unlist(cur_data()), na.rm = TRUE),
         min = min(unlist(cur_data()), na.rm = TRUE), 
         max = max(unlist(cur_data()), na.rm = TRUE)
  )

# Output:
      a     b     c d         e   avg min   max  
  <int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1     1     6    11 a         1    NA 1     a    
2     2     7    12 b         2    NA 12    b    
3     3     8    13 c         3    NA 13    c    
4     4     9    14 d         4    NA 14    d    
5     5    10    15 e         5    NA 10    e 

Works, but I loose column d :有效,但我松开了d列:

df %>% 
  select(-d) %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         avg = mean(unlist(cur_data()), na.rm = TRUE),
         min = min(unlist(cur_data()), na.rm = TRUE), 
         max = max(unlist(cur_data()), na.rm = TRUE)
  )

      a     b     c     e   avg   min   max
  <int> <int> <int> <int> <dbl> <dbl> <dbl>
1     1     6    11     1  4.75     1    11
2     2     7    12     2  5.75     2    12
3     3     8    13     3  6.75     3    13
4     4     9    14     4  7.75     4    14
5     5    10    15     5  8.75     5    15

Using pmap() from purrr might be more preferable since you need to select the data just once and you can use the select helpers:使用purrr中的pmap()可能更可取,因为您只需 select 数据一次,您可以使用 select 帮助程序:

df %>% 
 mutate(pmap_dfr(across(where(is.numeric)),
                 ~ data.frame(max = max(c(...)),
                              min = min(c(...)),
                              avg = mean(c(...)))))

      a     b     c d         e   max   min   avg
  <int> <int> <int> <chr> <int> <int> <int> <dbl>
1     1     6    11 a         1    11     1  4.75
2     2     7    12 b         2    12     2  5.75
3     3     8    13 c         3    13     3  6.75
4     4     9    14 d         4    14     4  7.75
5     5    10    15 e         5    15     5  8.75

Or with the addition of tidyr :或添加tidyr

df %>% 
 mutate(res = pmap(across(where(is.numeric)),
                   ~ list(max = max(c(...)),
                          min = min(c(...)),
                          avg = mean(c(...))))) %>%
 unnest_wider(res)

Edit:编辑:

Best way out here最好的出路

df %>%
  rowwise() %>% 
  mutate(min = min(c_across(a:e & where(is.numeric)), na.rm = TRUE),
         max = max(c_across(a:e & where(is.numeric)), na.rm = TRUE), 
         avg = mean(c_across(a:e & where(is.numeric)), na.rm = TRUE)
  )

# A tibble: 5 x 8
# Rowwise: 
      a     b     c d         e   min   max   avg
  <int> <int> <int> <chr> <int> <int> <int> <dbl>
1     1     6    11 a         1     1    11  4.75
2     2     7    12 b         2     2    12  5.75
3     3     8    13 c         3     3    13  6.75
4     4     9    14 d         4     4    14  7.75
5     5    10    15 e         5     5    15  8.75

Earlier Answer Your this will work won't even work properly, if you change the output sequence, see较早的回答如果您更改 output 序列,您的this will work将无法正常工作,请参阅

df %>% 
  select(-d) %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         min = min(unlist(cur_data()), na.rm = TRUE),
         max = max(unlist(cur_data()), na.rm = TRUE), 
         avg = mean(unlist(cur_data()), na.rm = TRUE)
  )

# A tibble: 5 x 7
# Rowwise: 
      a     b     c     e   min   max   avg
  <int> <int> <int> <int> <int> <int> <dbl>
1     1     6    11     1     1    11  5.17
2     2     7    12     2     2    12  6.17
3     3     8    13     3     3    13  7.17
4     4     9    14     4     4    14  8.17
5     5    10    15     5     5    15  9.17

Therefore, it is advised to do it like this-因此,建议这样做-

df %>% 
  select(-d) %>% 
  rowwise() %>% 
  mutate(min = min(c_across(a:e), na.rm = TRUE),
         max = max(c_across(a:e), na.rm = TRUE), 
         avg = mean(c_across(a:e), na.rm = TRUE)
  )

# A tibble: 5 x 7
# Rowwise: 
      a     b     c     e   min   max   avg
  <int> <int> <int> <int> <int> <int> <dbl>
1     1     6    11     1     1    11  4.75
2     2     7    12     2     2    12  5.75
3     3     8    13     3     3    13  6.75
4     4     9    14     4     4    14  7.75
5     5    10    15     5     5    15  8.75

One more alternative is另一种选择是

cols <- c('a', 'b', 'c', 'e')
df %>%
  rowwise() %>% 
  mutate(min = min(c_across(cols), na.rm = TRUE),
         max = max(c_across(cols), na.rm = TRUE), 
         avg = mean(c_across(cols), na.rm = TRUE)
  )

# A tibble: 5 x 8
# Rowwise: 
      a     b     c d         e   min   max   avg
  <int> <int> <int> <chr> <int> <int> <int> <dbl>
1     1     6    11 a         1     1    11  4.75
2     2     7    12 b         2     2    12  5.75
3     3     8    13 c         3     3    13  6.75
4     4     9    14 d         4     4    14  7.75
5     5    10    15 e         5     5    15  8.75

Even @Sinh suggested approach of group_by won't work properly in these cases.在这些情况下,即使@Sinh 建议的 group_by 方法也无法正常工作。

Here is one method which would preserve the data.frame attribute in mutate if we want to set a particular column to row name attribute ( column_to_rownames ) and then return the attribute after the transformation如果我们想将特定列设置为行名属性( column_to_rownames ),然后在转换后返回该属性,这是一种将data.frame属性保留在mutate中的方法

library(dplyr)
library(tibble)
library(purrr)
df %>% 
   column_to_rownames('d') %>%
   mutate(max = reduce(., pmax), min = reduce(., pmin), 
         avg = rowMeans(.)) %>% 
   rownames_to_column('d')
#  d a  b  c e max min  avg
#1 a 1  6 11 1  11   1 4.75
#2 b 2  7 12 2  12   2 5.75
#3 c 3  8 13 3  13   3 6.75
#4 d 4  9 14 4  14   4 7.75
#5 e 5 10 15 5  15   5 8.75

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM