[英]R dplyr rowwise mean or min and other methods?
如何使用 dplyr 获取 data.frame 上每一行的最小(或平均值)值? 我的意思是相同的结果
apply(mydataframe, 1, mean)
apply(mydataframe, 1, min)
我试过了
mydataframe %>% rowwise() %>% mean
要么
mydataframe %>% rowwise() %>% summarise(mean)
或其他组合,但我总是出错,我不知道正确的方法。
我知道我也可以使用 rowMeans,但没有简单的“rowMin”等价物。 还有一个 matrixStats 包,但大多数函数不接受 data.frames,只接受矩阵。
如果我想计算我可以使用的最小行
do.call(pmin, mydataframe) 对于行向均值,有没有像这样简单的东西?
do.call(mean, mydataframe)
不起作用,我想我需要一个 pmean 函数或更复杂的东西。
谢谢
为了比较结果,我们都可以在同一个例子上工作:
set.seed(124)
df <- data.frame(A=rnorm(10), B=rnorm(10), C=rnorm(10))
我想这就是你想要完成的:
df <- data.frame(A=rnorm(10), B=rnorm(10), C=rnorm(10))
library(dplyr)
df %>% rowwise() %>% mutate(Min = min(A, B, C), Mean = mean(c(A, B, C)))
# A B C Min Mean
# 1 1.3720142 0.2156418 0.61260582 0.2156418 0.73342060
# 2 -1.4265665 -0.2090585 -0.05978302 -1.4265665 -0.56513600
# 3 0.6801410 1.5695065 -2.70446924 -2.7044692 -0.15160724
# 4 0.0335067 0.8367425 -0.83621791 -0.8362179 0.01134377
# 5 -0.2068252 -0.2305140 0.23764322 -0.2305140 -0.06656532
# 6 -0.3571095 -0.8776854 -0.80199141 -0.8776854 -0.67892877
# 7 1.0667424 -0.6376245 -0.41189564 -0.6376245 0.00574078
# 8 -1.0003376 -1.5985281 0.90406055 -1.5985281 -0.56493504
# 9 -0.8218494 1.1100531 -1.12477401 -1.1247740 -0.27885677
# 10 0.7868666 0.6099156 -0.58994138 -0.5899414 0.26894694
似乎有dplyr
说,从长远来看,某些dplyr
函数(如rowwise
可能会被弃用( 此处显示的这种隆隆声)。 相反,来自purrr
的map
函数系列中的某些函数(例如pmap
函数)可用于执行此类计算:
library(tidyverse)
df %>% mutate(Min = pmap(df, min), Mean = rowMeans(.))
# A B C Min Mean
# 1 -1.38507062 0.3183367 -1.10363778 -1.385071 -0.7234572
# 2 0.03832318 -1.4237989 0.44418506 -1.423799 -0.3137635
# 3 -0.76303016 -0.4050909 -0.20495061 -0.7630302 -0.4576905
# 4 0.21230614 0.9953866 1.67563243 0.2123061 0.9611084
# 5 1.42553797 0.9588178 -0.13132225 -0.1313222 0.7510112
# 6 0.74447982 0.9180879 -0.19988298 -0.199883 0.4875616
# 7 0.70022940 -0.1509696 0.05491242 -0.1509696 0.2013907
# 8 -0.22935461 -1.2230688 -0.68216549 -1.223069 -0.7115296
# 9 0.19709386 -0.8688243 -0.72770415 -0.8688243 -0.4664782
# 10 1.20715377 -1.0424854 -0.86190429 -1.042485 -0.2324120
平均数是一个特例(因此使用了碱功能的rowMeans
),由于mean
上data.frame对象用R 3.0弃用。
这个怎么样?
library(dplyr)
as.data.frame(t(mtcars)) %>%
summarise_all(funs(mean))
为了更加清晰,您可以在最后添加另一个t()
:
as.data.frame(t(mtcars)) %>%
summarise_all(funs(mean)) %>%
t()
使用dplyr
1.0.0,您可以将rowwise
与c_across
rowwise
使用:
library(dplyr)
df %>%
rowwise() %>%
mutate(Min = min(c_across(A:C)),
Mean = mean(c_across(A:C)))
# A B C Min Mean
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 -1.39 0.318 -1.10 -1.39 -0.723
# 2 0.0383 -1.42 0.444 -1.42 -0.314
# 3 -0.763 -0.405 -0.205 -0.763 -0.458
# 4 0.212 0.995 1.68 0.212 0.961
# 5 1.43 0.959 -0.131 -0.131 0.751
# 6 0.744 0.918 -0.200 -0.200 0.488
# 7 0.700 -0.151 0.0549 -0.151 0.201
# 8 -0.229 -1.22 -0.682 -1.22 -0.712
# 9 0.197 -0.869 -0.728 -0.869 -0.466
#10 1.21 -1.04 -0.862 -1.04 -0.232
一个dplyr
和purrr
选项,其中可以使用选择助手:
df %>%
mutate(Min = select(., everything()) %>% reduce(pmin),
Max = select(., everything()) %>% reduce(pmax))
A B C Min Max
1 -1.38507062 0.3183367 -1.10363778 -1.3850706 0.3183367
2 0.03832318 -1.4237989 0.44418506 -1.4237989 0.4441851
3 -0.76303016 -0.4050909 -0.20495061 -0.7630302 -0.2049506
4 0.21230614 0.9953866 1.67563243 0.2123061 1.6756324
5 1.42553797 0.9588178 -0.13132225 -0.1313222 1.4255380
6 0.74447982 0.9180879 -0.19988298 -0.1998830 0.9180879
7 0.70022940 -0.1509696 0.05491242 -0.1509696 0.7002294
8 -0.22935461 -1.2230688 -0.68216549 -1.2230688 -0.2293546
9 0.19709386 -0.8688243 -0.72770415 -0.8688243 0.1970939
10 1.20715377 -1.0424854 -0.86190429 -1.0424854 1.2071538
认为找到了一个解决方案 - 只需转置您的 data.frame:
x <- data_frame(x = rnorm(10),
y = rnorm(10))
# A tibble: 10 × 2
x y
<dbl> <dbl>
1 -1.1240392 0.9306028477
2 -0.8213379 0.2500495105
3 -0.8289104 -0.3693704483
4 -0.6486601 -1.1421141986
5 0.5098542 -0.3703368343
6 -0.3644690 -0.0003744377
7 0.7404057 0.1166905738
8 -0.2475214 -0.0802864865
9 0.2637841 -0.7717699521
10 1.4092874 0.2998021578
x %>%
t() %>%
data.frame() %>%
mutate_all(funs(min)) %>%
unique() %>%
t()
1
X1 -1.1240392
X2 -0.8213379
X3 -0.8289104
X4 -1.1421142
X5 -0.3703368
X6 -0.3644690
X7 0.1166906
X8 -0.2475214
X9 -0.7717700
X10 0.2998022
如何避免必须指定每个列名? 像这样:
set.seed(124)
df <- data.frame(A=rnorm(10), B=rnorm(10), C=rnorm(10))
library(dplyr)
df %>%
rowwise() %>%
mutate(Mean = mean(
eval(
# The snippet can also be wrapped within a function
parse(text = sprintf("c(%s)", paste(names(.), collapse = ",")))
)
),
ArgMin = which.min(
eval(
parse(text = sprintf("c(%s)", paste(names(.), collapse = ",")))
)
))
要么
getColnamesExpr <- function(df_names) parse(text = sprintf("c(%s)", paste(df_names, collapse = ",")))
df %>%
rowwise() %>%
mutate(
Mean = mean(eval(getColnamesExpr(names(.)))),
argmin = which.min(eval(getColnamesExpr(names(.))))
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.