简体   繁体   English

按组和 R 中的多列汇总

[英]Summarize by group and across multiple columns in R

I have a dataframe including some base info of students.我有一个数据框,其中包含学生的一些基本信息。 I want to get summary statistics about Age, Sex and Group.我想获得关于年龄、性别和群体的汇总统计数据。

set.seed(500)
testdf <- data.frame(ID = paste0("Stu", c(1:10)),
                     Age = sample(18:25, 10, replace = T),
                     Sex =  sample(c("Boy", "Girl", "NA"), 10, replace = T),
                     Name = c("Pwyll","Flavian","Leehi","Zuzana","Aniya","Bogomil"
                              ,"Lameez","Prudencia","Ikuo","Grayson"),
                     GroupMath = sample(LETTERS[1:2], 10, replace = T),
                     GroupEng = sample(LETTERS[1:2], 10, replace = T),
                     GroupScie = sample(LETTERS[1:2], 10, replace = T),
                     GroupChine = sample(LETTERS[1:2], 10, replace = T))

I want it to look like this picture.(Desired output.)我希望它看起来像这张照片。(所需的输出。)

在此处输入图像描述

And in my code I use three parts to deal with GroupMath, then GroupEng, then GroupScie, then GroupChine.在我的代码中,我使用三个部分来处理 GroupMath,然后是 GroupEng,然后是 GroupScie,然后是 GroupChine。 Does anyone know how I can make this more efficient?有谁知道我怎样才能提高效率? I can't thank you enough.我怎么感谢你都不为过。

N.math <- testdf %>% group_by(GroupMath) %>% count(GroupMath) 

Age.math <- testdf %>% group_by(GroupMath) %>% summarize(
      Mean = mean(Age),
      Max = max(Age),
      Min = min(Age),
      sd = sd(Age))


Sex.math <- testdf %>% group_by(GroupMath) %>% count(Sex)

Pivoting Data to Long Form将数据透视为长格式

Here is at least one way to simplify your summary stats so you are not aggregating each group one-by-one.这里至少有一种方法可以简化您的汇总统计数据,这样您就不会一个一个地汇总每个组。 First, you can pivot your data to long format using the groups and class as your target variable, then summarise data for these groups.首先,您可以使用组和类作为目标变量将数据转换为长格式,然后汇总这些组的数据。 First, the pivot:首先,支点:

#### Load Tidyverse ####
library(tidyverse)

#### Pivot to Long Format ####
groups <- testdf %>%
  pivot_longer(cols = contains("Group"),
               names_to = "Class",
               values_to = "Group")
groups

Which looks like this:看起来像这样:

# A tibble: 40 × 6
   ID      Age Sex   Name    Class      Group
   <chr> <int> <chr> <chr>   <chr>      <chr>
 1 Stu1     24 Girl  Pwyll   GroupMath  B    
 2 Stu1     24 Girl  Pwyll   GroupEng   B    
 3 Stu1     24 Girl  Pwyll   GroupScie  A    
 4 Stu1     24 Girl  Pwyll   GroupChine B    
 5 Stu2     20 Girl  Flavian GroupMath  B    
 6 Stu2     20 Girl  Flavian GroupEng   A    
 7 Stu2     20 Girl  Flavian GroupScie  A    
 8 Stu2     20 Girl  Flavian GroupChine B    
 9 Stu3     24 NA    Leehi   GroupMath  A    
10 Stu3     24 NA    Leehi   GroupEng   B 

Aggregating Data聚合数据

Then you can aggregate the data using class, group, and sex:然后您可以使用类、组和性别聚合数据:

#### Aggregate Data by Class x Group ####
sums <- groups %>% 
  group_by(Class,Group,Sex) %>% 
  summarise(
  Mean = mean(Age),
  Max = max(Age),
  Min = min(Age),
  sd = sd(Age)) %>% 
  ungroup()
sums

Shown below.如下所示。 Notice that some values are NA because there is only one person per gender in some cases, so there can be no standard deviation in this case:请注意,某些值是 NA,因为在某些情况下每个性别只有一个人,因此在这种情况下不存在标准偏差:

# A tibble: 16 × 7
   Class      Group Sex    Mean   Max   Min    sd
   <chr>      <chr> <chr> <dbl> <int> <int> <dbl>
 1 GroupChine A     Boy    24      24    24 NA   
 2 GroupChine A     Girl   22.3    24    21  1.53
 3 GroupChine B     Girl   20      24    18  2.35
 4 GroupChine B     NA     24      24    24 NA   
 5 GroupEng   A     Boy    24      24    24 NA   
 6 GroupEng   A     Girl   20.5    24    19  2.38
 7 GroupEng   B     Girl   21.2    24    18  2.5 
 8 GroupEng   B     NA     24      24    24 NA   
 9 GroupMath  A     Girl   20.8    24    19  2.36
10 GroupMath  A     NA     24      24    24 NA   
11 GroupMath  B     Boy    24      24    24 NA   
12 GroupMath  B     Girl   21      24    18  2.58
13 GroupScie  A     Girl   21.7    24    20  2.08
14 GroupScie  A     NA     24      24    24 NA   
15 GroupScie  B     Boy    24      24    24 NA   
16 GroupScie  B     Girl   20.4    24    18  2.51

Then you can get gender counts like so:然后你可以像这样得到性别计数:

#### Get Grouped Gender Counts ####
sex <- groups %>% 
  group_by(Class,Group) %>% 
  count(Sex) %>% 
  ungroup()
sex

Which looks like this:看起来像这样:

# A tibble: 16 × 4
   Class      Group Sex       n
   <chr>      <chr> <chr> <int>
 1 GroupChine A     Boy       1
 2 GroupChine A     Girl      3
 3 GroupChine B     Girl      5
 4 GroupChine B     NA        1
 5 GroupEng   A     Boy       1
 6 GroupEng   A     Girl      4
 7 GroupEng   B     Girl      4
 8 GroupEng   B     NA        1
 9 GroupMath  A     Girl      4
10 GroupMath  A     NA        1
11 GroupMath  B     Boy       1
12 GroupMath  B     Girl      4
13 GroupScie  A     Girl      3
14 GroupScie  A     NA        1
15 GroupScie  B     Boy       1
16 GroupScie  B     Girl      5

Joining Data Frames连接数据框

Finally you can join these two data frames in this way:最后,您可以通过这种方式加入这两个数据框:

#### Join ####
sums %>% 
  right_join(sex)

Giving you the final product.给你最终的产品。 You can see now where the NA values come from, such as Row 1 which only has 1 boy included, making SD impossible to evaluate:您现在可以看到 NA 值的来源,例如仅包含 1 个男孩的第 1 行,使得 SD 无法评估:

Joining, by = c("Class", "Group", "Sex")
# A tibble: 16 × 8
   Class      Group Sex    Mean   Max   Min    sd     n
   <chr>      <chr> <chr> <dbl> <int> <int> <dbl> <int>
 1 GroupChine A     Boy    24      24    24 NA        1
 2 GroupChine A     Girl   22.3    24    21  1.53     3
 3 GroupChine B     Girl   20      24    18  2.35     5
 4 GroupChine B     NA     24      24    24 NA        1
 5 GroupEng   A     Boy    24      24    24 NA        1
 6 GroupEng   A     Girl   20.5    24    19  2.38     4
 7 GroupEng   B     Girl   21.2    24    18  2.5      4
 8 GroupEng   B     NA     24      24    24 NA        1
 9 GroupMath  A     Girl   20.8    24    19  2.36     4
10 GroupMath  A     NA     24      24    24 NA        1
11 GroupMath  B     Boy    24      24    24 NA        1
12 GroupMath  B     Girl   21      24    18  2.58     4
13 GroupScie  A     Girl   21.7    24    20  2.08     3
14 GroupScie  A     NA     24      24    24 NA        1
15 GroupScie  B     Boy    24      24    24 NA        1
16 GroupScie  B     Girl   20.4    24    18  2.51     5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM