![](/img/trans.png)
[英]compute row-wise summary statistics such as mean, max, min across columns sharing similar names using dplyr
[英]Return statistics like min or max from columns into rows with dplyr pipeline
我的問題與此類似: R dplyr rowwise mean 或 min 和其他方法? 想知道是否有任何dplyr函數(或諸如pivot_
等函數的組合),這可能會在通常的dplyr 單行中提供所需的輸出?
library(tidyverse); set.seed(1);
#Sample Data:
sampleData <- data.frame(O = seq(1, 9, by = .1), A = rnorm(81), U = sample(1:81,
81), I = rlnorm(81), R = sample(c(1, 81), 81, replace = T)); #sampleData;
#NormalOuput:
NormalOuput <- sampleData %>% summarise_all(list(min = min, max = max));
NormalOuput;
#> O_min A_min U_min I_min R_min O_max A_max U_max I_max R_max
#> 1 1 -2.2147 1 0.1970368 1 9 2.401618 81 14.27712 81
#Expected output:
ExpectedOuput <- data.frame(stats = c('min', 'max'), O = c(1, 9), A = c(-2.2147,
2.401618), U = c(1, 81), I = c(0.1970368, 14.27712), R = c(1, 81));
ExpectedOuput;
#> stats O A U I R
#> 1 min 1 -2.214700 1 0.1970368 1
#> 2 max 9 2.401618 81 14.2771200 81
由reprex 包(v0.3.0) 於 2020 年 8 月 26 日創建
筆記:
實際場景中的列數可能很大,因此無法直接調用名稱。
編輯
充其量,我明白了:
sampleData %>% summarise(across(everything(), list(min = min, max = max))) %>%
t() %>% data.frame(Value = .) %>% tibble::rownames_to_column('Variables')
Variables Value
1 O_min 1.0000000
2 O_max 9.0000000
3 A_min -2.2146999
4 A_max 2.4016178
5 U_min 1.0000000
6 U_max 81.0000000
7 I_min 0.1970368
8 I_max 14.2771167
9 R_min 1.0000000
10 R_max 81.0000000
我建議混合使用tidyverse
功能,例如 next。 您必須重塑數據,然后使用所需的匯總函數進行聚合,然后作為策略,您可以再次重新格式化並獲得預期的輸出:
library(tidyverse)
sampleData %>% pivot_longer(cols = names(sampleData)) %>%
group_by(name) %>% summarise(Min=min(value,na.rm=T),
Max=max(value,na.rm=T)) %>%
rename(var=name) %>%
pivot_longer(cols = -var) %>%
pivot_wider(names_from = var,values_from=value)
輸出:
# A tibble: 2 x 6
name A I O R U
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Min -2.21 0.197 1 1 1
2 Max 2.40 14.3 9 81 81
您可以使用 new-ish cross across()
來消除 Duck 的支點之一:
sampleData %>%
summarise(across(everything(),
list(min = min, max = max))) %>%
pivot_longer(
cols = everything(),
names_to = c("var", "stat"),
names_sep = "_"
) %>%
pivot_wider(id_cols = "stat",
names_from = "var")
# # A tibble: 2 x 6
# stat O A U I R
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 min 1 -2.21 1 0.197 1
# 2 max 9 2.40 81 14.3 81
但最好的可能是 markus 在評論中的建議,我已經在這里改編了:
map_dfr(sampleData, function(x) c(min(x), max(x))) %>%
mutate(stat = c("min", "max"))
# # A tibble: 2 x 6
# O A U I R stat
# <dbl> <dbl> <int> <dbl> <dbl> <chr>
# 1 1 -2.21 1 0.197 1 min
# 2 9 2.40 81 14.3 81 max
在玩pivot_longer
,我發現這個兩步pivot_longer
也有效(基於@Gregor Thomas 的回答,這里只有一個pivot_
而不是兩個或更多):
sampleData %>%
summarise(across(everything(), list(min, max))) %>%
pivot_longer(everything(), names_to = c(".value", "stats"),
names_sep = "_")
# A tibble: 2 x 6
stats O A U I R
<chr> <dbl> <dbl> <int> <dbl> <dbl>
1 1 1 -2.21 1 0.197 1
2 2 9 2.40 81 14.3 81
更多信息: https : //tidyr.tidyverse.org/reference/pivot_longer.html#examples
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.