[英]tidyverse summarize multiple columns but show result as rows
I have data where I want to get a bunch of summary statistics for multiple columns with the tidyverse approach.我有数据,我想使用 tidyverse 方法获取多个列的汇总统计信息。 However, utilizing tidyverse's summarize
function, it will create each column statistic as a new column, whereas I would prefer to see the column names as rows and each statistic as a new column.但是,利用 tidyverse 的summarize
function,它会将每个列统计信息创建为一个新列,而我更愿意将列名视为行并将每个统计信息视为一个新列。 So my question is:所以我的问题是:
Is there a more elegant (and I know "elegant" is a vague term) way to achieve this than by accompanying the summarize
function with a pivot_longer
and pivot_wider
?是否有比通过带有pivot_longer
和pivot_wider
的summarize
function 更优雅(我知道“优雅”是一个模糊的术语)的方式来实现这一点?
I'm using the latest dev versions of the tidyverse package, ie dplyr 0.8.99.9003 and tidyr 1.1.0.我正在使用 tidyverse package 的最新开发版本,即 dplyr 0.8.99.9003 和 tidyr 1.1.0。 So it's fine if any solution requires new functions from these packages that are not yet on CRAN.因此,如果任何解决方案都需要这些软件包中尚未出现在 CRAN 上的新功能,那很好。
library(tidyverse)
dat <- as.data.frame(matrix(1:100, ncol = 5))
dat %>%
summarize(across(everything(), list(mean = mean,
sum = sum))) %>%
pivot_longer(cols = everything(),
names_sep = "_",
names_to = c("variable", "statistic")) %>%
pivot_wider(names_from = "statistic")
Expected outcome:预期结果:
# A tibble: 5 x 3
variable mean sum
<chr> <dbl> <dbl>
1 V1 10.5 210
2 V2 30.5 610
3 V3 50.5 1010
4 V4 70.5 1410
5 V5 90.5 1810
Note: I'm not set on the name of any of the columns, so if there's a nice way to get the structure of the table with different/generic names, that'd also be fine.注意:我没有设置任何列的名称,所以如果有一种很好的方法可以获取具有不同/通用名称的表结构,那也可以。
not a tidyverse
solution, but a data.table
one instead.. also, not sure if it is more 'elegant';-)不是一个tidyverse
解决方案,而是一个data.table
代替.. 另外,不确定它是否更“优雅”;-)
but here you go...但在这里你 go...
library( data.table )
#make 'dat' a data.table
setDT(dat)
#transpose, keeping column names
dat <- transpose(dat, keep.names = "var_name" )
#melt to long and summarise
melt(dat, id.vars = "var_name")[, .(mean = mean(value), sum = sum(value) ), by = var_name]
# var_name mean sum
# 1: V1 10.5 210
# 2: V2 30.5 610
# 3: V3 50.5 1010
# 4: V4 70.5 1410
# 5: V5 90.5 1810
You can skip the pivot_wider
step by using ".value"
in names_to
.您可以通过在names_to
中使用".value"
来跳过pivot_wider
步骤。
library(dplyr)
dat %>%
summarise_all(list(mean = mean,sum = sum)) %>%
tidyr::pivot_longer(cols = everything(),
names_sep = "_",
names_to = c("variable", ".value"))
# A tibble: 5 x 3
# variable mean sum
# <chr> <dbl> <int>
#1 V1 10.5 210
#2 V2 30.5 610
#3 V3 50.5 1010
#4 V4 70.5 1410
#5 V5 90.5 1810
You can first stack all columns together and summarise by group.您可以先将所有列堆叠在一起并按组汇总。
dat %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise_at("value", list(~mean(.), ~sum(.)))
# # A tibble: 5 x 3
# name mean sum
# <chr> <dbl> <int>
# 1 V1 10.5 210
# 2 V2 30.5 610
# 3 V3 50.5 1010
# 4 V4 70.5 1410
# 5 V5 90.5 1810
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.