简体   繁体   English

排序数据以获取 R 中每个组的大小、平均值和 SD

[英]Sorting data to get size, mean and SD for each group in R

A small sample of the data is shown below.数据的一个小样本如下所示。 Please consider I have more columns请考虑我有更多的列

dat<-read.table (text="Tall1    Group1  Tall2   Group2  Tall3   Group3  Tall4   Group4
    25  M   24  M   23  N   33  N
    34  N   16  M   23  M   43  N
    41  N   20  M   44  N   60  N
    25  M   24  N   44  N   55  M
    26  N   12  N   44  M   90  M
    ", header=TRUE)

I want to get the following table我想得到下表

NoM MeanM   SDM NoN MeanN   SDN
2   25  0   3   33.66   7.5
3   20  4   2   18  8.45
2   33.5    14.48   3   37  12.12
2   72.5    24.74   3   45.33   13.65

I want to get N, mean and SD for each group according to their Talls, ie, Tall1 with Group1, Tall2 with Group2 and...我想根据每个组的 Talls 获得 N、平均值和 SD,即 Tall1 和 Group1,Tall2 和 Group2 和......

library(dplyr)
library(tidyr)
dat %>% 
    pivot_longer(cols = everything(), names_to = c(".value", "set"), names_pattern = "([A-Za-z]*)(\\d+)$") %>% 
    group_by(set, Group) %>% 
    summarise(No=n(), Mean = mean(Tall), SD = sd(Tall), .groups = "drop") %>% 
    pivot_wider(names_from = Group, values_from = c(No, Mean, SD), names_sep = "") %>%
    select(-set)

`summarise()` has grouped output by 'set'. You can override using the `.groups` argument.
# A tibble: 4 × 6
    NoM   NoN MeanM MeanN   SDM   SDN
  <int> <int> <dbl> <dbl> <dbl> <dbl>
1     2     3  25    33.7   0    7.51
2     3     2  20    18     4    8.49
3     2     3  33.5  37    14.8 12.1 
4     2     3  72.5  45.3  24.7 13.7 

This should work.这应该有效。 I couldn't think of a good way to get the pairs of columns together, so had to kind of force it.我想不出一个把成对的柱子放在一起的好方法,所以不得不强迫它。

library(dplyr)
library(tidyr)

dat_value<-dat %>% select(contains("Tall")) %>% pivot_longer(cols=1:4,names_to = "category",values_to="values")

dat_group<-dat %>% select(contains("Group")) %>% pivot_longer(cols=1:4,names_to = "category",values_to="group") %>%
  select(group)

new_dat<-dat_value %>%
  bind_cols(dat_group) %>%
  group_by(category) %>%
  summarize(NoM=sum(group=="M"),
            MeanM=mean(values[group=="M"]),
            SDM=sd(values[group=="M"]),
            NoN=sum(group=="N"),
            MeanN=mean(values[group=="N"]),
            SDN=sd(values[group=="N"])
            ) %>%
  ungroup()

new_dat
#> # A tibble: 4 × 7
#>   category   NoM MeanM   SDM   NoN MeanN   SDN
#>   <chr>    <int> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 Tall1        2  25     0       3  33.7  7.51
#> 2 Tall2        3  20     4       2  18    8.49
#> 3 Tall3        2  33.5  14.8     3  37   12.1 
#> 4 Tall4        2  72.5  24.7     3  45.3 13.7

Created on 2022-01-14 by the reprex package (v2.0.1)代表 package (v2.0.1) 于 2022 年 1 月 14 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM