dplyr 改變列范圍的行最大

Question

我可以使用以下返回最多 2 列

newiris<-iris %>%
 rowwise() %>%
 mutate(mak=max(Sepal.Width,Petal.Length))

我想要做的是在一系列列中找到最大值，這樣我就不必像這樣命名每個列

newiris<-iris %>%
 rowwise() %>%
 mutate(mak=max(Sepal.Width:Petal.Length))

有任何想法嗎？

Answer 1

而不是rowwise() ，這可以用pmax來完成

iris %>%
      mutate(mak=pmax(Sepal.Width,Petal.Length, Petal.Width))

可能是我們可以使用interp從library(lazyeval)如果我們想引用存儲在列名vector 。

library(lazyeval)
nm1 <- names(iris)[2:4]
iris %>% 
     mutate_(mak= interp(~pmax(v1), v1= as.name(nm1)))

Answer 2

使用rlang和 quasiquotation，我們還有另一個 dplyr 選項。 首先，獲取我們想要計算並行最大值的行名稱：

iris_cols <- iris %>% select(Sepal.Length:Petal.Width) %>% names()

然后我們就可以使用了!!! 和rlang::syms計算這些列的每一行的並行最大值：

iris %>%
  mutate(mak=pmax(!!!rlang::syms(iris_cols)))

rlang::syms接受一個字符串輸入（列名），並將其轉換為一個符號
!!! 取消引用並拼接其參數，這里是列名

這使：

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species mak
1            5.1         3.5          1.4         0.2     setosa 5.1
2            4.9         3.0          1.4         0.2     setosa 4.9
3            4.7         3.2          1.3         0.2     setosa 4.7
4            4.6         3.1          1.5         0.2     setosa 4.6
5            5.0         3.6          1.4         0.2     setosa 5.0

h/t: https://stackoverflow.com/a/47773379/1036500

Answer 3

目前（dplyr 1.0.2），這有效：

newiris<-iris %>%
 rowwise() %>%
 mutate(mak=max(c_across(Sepal.Width:Petal.Length)))

這也讓您可以使用選擇助手（starts_with 等）。

Answer 4

為了在使用dplyr時選擇一些列而不輸入全名，我更喜歡從subset函數中select參數。

您可以像這樣獲得所需的結果：

iris %>% subset(select = 2:4) %>% mutate(mak = do.call(pmax, (.))) %>%
  select(mak) %>% cbind(iris)

Answer 5

一種方法是將數據通過管道傳輸到 select 然后使用使pmax rowwise 的函數調用pmax （這與@inscaven 使用do.call的答案非常相似，不幸的是，R 中沒有rowMaxs函數，因此我們必須使用使pmax函數——下面我使用了purrr::pmap )

library(dplyr)
library(purrr)

# to get the value of the max
iris$rowwisemax <- iris %>% select(Sepal.Width:Petal.Length) %>% pmap(pmax) %>% as.numeric

# to get the argmax
iris$whichrowwisemax <- iris %>% select(Sepal.Width:Petal.Length) %>% {names(.)[max.col(.)]}

Answer 6

好像@ akrun的答案只解決時，你可以在所有的變量的名稱輸入的情況下，不管是使用mutate直接mutate(pmax_value=pmax(var1, var2))或使用惰性計算時mutate_和interp通過mutate_(interp(~pmax(v1, v2), v1=as.name(var1), v2=as.name(var2)) 。

如果你想使用冒號語法Sepal.Length:Petal.Width或者你碰巧有一個帶有列名的向量，我可以看到兩種方法來做到這一點。

第一個更優雅。 您整理數據並在分組時取值中的最大值：

data(iris)
library(dplyr)
library(tidyr)

iris_id = iris %>% mutate(id=1:nrow(.))
iris_id %>%
  gather('attribute', 'value', Sepal.Length:Petal.Width) %>%
  group_by(id) %>%
  summarize(max_attribute=max(value)) %>%
  right_join(iris_id, by='id') %>%
  head(3)
## # A tibble: 3 × 7
##      id max_attribute Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##   <int>         <dbl>        <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
## 1     1           5.1          5.1         3.5          1.4         0.2  setosa
## 2     2           4.9          4.9         3.0          1.4         0.2  setosa
## 3     3           4.7          4.7         3.2          1.3         0.2  setosa

更難的方法是使用內插公式。 如果您有一個字符向量，其中包含要最大化的變量名稱，或者您的表格太高/太寬而無法整理，這很好。

# Make a character vector of the names of the columns we want to take the
# maximum over
target_columns = iris %>% select(-Species) %>% names
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

# Make a vector of dummy variables that will take the place of the real
# column names inside the interpolated formula
dummy_vars = sapply(1:length(target_columns), function(i) sprintf('x%i', i))
## [1] "x1" "x2" "x3" "x4"

# Paste those variables together to make the argument of the pmax in the
# interpolated formula
dummy_vars_string = paste0(dummy_vars, collapse=',')
## [1] "x1,x2,x3,x4"

# Make a named list that maps the dummy variable names (e.g., x1) to the
# real variable names (e.g., Sepal.Length)
dummy_vars_list = lapply(target_columns, as.name) %>% setNames(dummy_vars)
## $x1
## Sepal.Length
##
## $x2
## Sepal.Width
## 
## $x3
## Petal.Length
##
## $x4
## Petal.Width

# Make a pmax formula using the dummy variables
max_formula = as.formula(paste0(c('~pmax(', dummy_vars_string, ')'), collapse=''))
## ~pmax(x1, x2, x3, x4)

# Interpolate the formula using the named variables
library(lazyeval)
iris %>%
  mutate_(max_attribute=interp(max_formula, .values=dummy_vars_list)) %>%
  head(3)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species max_attribute
## 1          5.1         3.5          1.4         0.2  setosa           5.1
## 2          4.9         3.0          1.4         0.2  setosa           4.9
## 3          4.7         3.2          1.3         0.2  setosa           4.7

Answer 7

這是一個 base-R 解決方案：可以使用subset()選擇一系列列名。 可以使用transform()和apply()的組合添加行式最大值。

newiris <- transform(iris, mak = apply(subset(iris, select=Sepal.Width:Petal.Length), 1, max))

Answer 8

如果一個人想使用像contains() 、 starts_with()這樣的選擇助手，我們可以使用

library(dplyr)
iris |> 
  mutate(max_value = purrr::pmap_dbl(select(iris, contains("petal")), pmax, na.rm=TRUE))

dplyr 改變列范圍的行最大

問題描述

8 個解決方案

解決方案1
46 2015-10-06 20:03:58

解決方案2
22 2018-07-16 05:24:38

解決方案3
8 2020-11-05 14:14:12

解決方案4
7 2015-10-07 08:02:17

解決方案5
3 2019-12-20 16:43:52

解決方案6
1 2017-03-31 15:34:22

解決方案7
0 2020-11-05 15:03:53

解決方案8
0 2022-11-30 12:44:22

dplyr 改變列范圍的行最大

問題描述

8 個解決方案

解決方案1 46 2015-10-06 20:03:58

解決方案2 22 2018-07-16 05:24:38

解決方案3 8 2020-11-05 14:14:12

解決方案4 7 2015-10-07 08:02:17

解決方案5 3 2019-12-20 16:43:52

解決方案6 1 2017-03-31 15:34:22

解決方案7 0 2020-11-05 15:03:53

解決方案8 0 2022-11-30 12:44:22

解決方案1
46 2015-10-06 20:03:58

解決方案2
22 2018-07-16 05:24:38

解決方案3
8 2020-11-05 14:14:12

解決方案4
7 2015-10-07 08:02:17

解決方案5
3 2019-12-20 16:43:52

解決方案6
1 2017-03-31 15:34:22

解決方案7
0 2020-11-05 15:03:53

解決方案8
0 2022-11-30 12:44:22