[英]Rowwise median for multiple columns using dplyr
Given the following dataset, I want to compute for each row the median of the columns M1,M2 and M3.给定以下数据集,我想为每一行计算列 M1、M2 和 M3 的中值。 I am looking for a solution where the final column is added to the dataframe under the name 'Median'.我正在寻找一种解决方案,其中将最后一列添加到名为“中位数”的数据框中。 The column names (M1:M3) should not be used directly (in the original dataset, there are many more columns, not just 3).列名 (M1:M3) 不应直接使用(在原始数据集中,还有更多列,而不仅仅是 3 个)。
# A tibble: 8 x 5
I1 M1 M2 I2 M3
<int> <int> <int> <int> <int>
1 3 4 5 3 5
2 2 2 2 2 1
3 2 2 2 2 2
4 3 1 3 3 1
5 2 1 3 3 1
6 3 2 4 4 3
7 3 1 3 4 1
8 2 1 3 2 3
You can load the dataset using:您可以使用以下方法加载数据集:
df = structure(list(I1 = c(3L, 2L, 2L, 3L, 2L, 3L, 3L, 2L), M1 = c(4L,
2L, 2L, 1L, 1L, 2L, 1L, 1L), M2 = c(5L, 2L, 2L, 3L, 3L, 4L, 3L,
3L), I2 = c(3L, 2L, 2L, 3L, 3L, 4L, 4L, 2L), M3 = c(5L, 1L, 2L,
1L, 1L, 3L, 1L, 3L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), .Names = c("I1", "M1", "M2", "I2",
"M3"))
I know that several similar questions have already been asked.我知道已经提出了几个类似的问题。 However, most solutions posted use rowMeans
or rowSums
.但是,发布的大多数解决方案都使用rowMeans
或rowSums
。 I'm looking for a solution where:我正在寻找一个解决方案,其中:
The reason for (2) is that I am teaching the 'tidyverse' to total beginners. (2) 的原因是我正在向初学者教授“tidyverse”。
We could use rowMedians
我们可以使用rowMedians
library(matrixStats)
library(dplyr)
df %>%
mutate(Median = rowMedians(as.matrix(.[grep('M\\d+', names(.))])))
Or if we need to use only tidyverse
functions, convert it to 'long' format with gather
, summarize
by row
and get the median
of the 'value' column或者,如果我们只需要使用tidyverse
功能,将其转换为“长”与格式gather
, summarize
由row
,并获得median
的“价值”列
df %>%
rownames_to_column('rn') %>%
gather(key, value, starts_with('M')) %>%
group_by(rn) %>%
summarise(Median = median(value)) %>%
ungroup %>%
select(-rn) %>%
bind_cols(df, .)
Or another option is rowwise()
from dplyr
(hope the row is not a problem)或者另一个选项是来自dplyr
rowwise()
(希望该行不是问题)
df %>%
rowwise() %>%
mutate(Median = median(c(!!! rlang::syms(grep('M', names(.), value=TRUE)))))
Given a dataframe df
with some numeric values:给定一个带有一些数值的数据框df
:
df <- structure(list(X0 = c(0.82046171427112, 0.836224720981912, 0.842547521493854,
0.848014287631906, 0.850943494153631, 0.85425398956647, 0.85616876970771,
0.856855792247478, 0.857471048654811, 0.857507363153284, 0.874487063791594,
1.70684558846347, 1.95711031206168, 6.84386713155156), X1 = c(0.755674148966666,
0.765242580861224, 0.774422478168495, 0.776953642833977, 0.778128315184819,
0.778611604461183, 0.778624581647491, 0.778454002430202, 1.52708579075974,
13.0356519295685, 18.0590093408357, 21.1371199340156, 32.4192814934364,
33.2355314147089), X2 = c(0.772236670327724, 0.788112332251601,
0.797695511542613, 0.804257521548174, 0.809815828400878, 0.816592605516508,
0.819421106011397, 0.821734473885381, 0.822561946509595, 0.822334970491528,
0.822404634095793, 2.66875340820162, 1.40412743557514, 6.33377768022403
), X3 = c(0.764363881671609, 0.788288196346034, 0.79927498357549,
0.805446784334039, 0.810604881970155, 0.814634331592811, 0.817002594424753,
0.818129844752095, 0.818572101954132, 0.818630700031836, 3.06323952591121,
6.4477868357554, 11.4657041958038, 9.27821049066848)), class = "data.frame", row.names = c(NA,
-14L))
One can easily compute row-wise median using base R like so:可以使用基数 R 轻松计算行式中位数,如下所示:
df$median <- sapply(
seq(nrow(df)),
function(i) df[i, 1:4] %>% unlist %>% median
)
Above I select columns manually with numeric range, but to satisfy the dplyr
requirement you can use dplyr::select()
to choose your columns:上面我使用数字范围手动选择列,但为了满足dplyr
要求,您可以使用dplyr::select()
来选择您的列:
df$median <- sapply(
df %>% nrow %>% seq,
function(i) df[i, ] %>%
dplyr::select(X1, X2) %>%
unlist %>% median
)
I like this method because you don't have to search for different functions to calculate anything.我喜欢这种方法,因为您不必搜索不同的函数来计算任何东西。
For example, standard deviation:例如,标准偏差:
df$sd <- sapply(
df %>% nrow %>% seq,
function(i) df[i, ] %>%
dplyr::select(X1, X2) %>%
unlist %>% sd
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.