简体   繁体   English

如何计算嵌套组的平均值并计算 R 中的观察次数

[英]How to calculate means for nested groups and count number of observations in R

all (R users), THANK YOU VERY MUCH in advance.所有(R 用户),非常感谢您提前。 I've a data set that contains students' scores from multiple states.我有一个数据集,其中包含来自多个州的学生分数。 Each state has different schools (10 schools in this example), each school has to be either 'public' or 'private';每个 state 都有不同的学校(本例中为 10 所学校),每所学校必须是“公立”或“私立”; and three items' test scores.和三个项目的考试成绩。 I need to calculate the mean of each school for each item, and display the type of school, then save the results into excel file to export them.我需要为每个项目计算每个学校的平均值,并显示学校的类型,然后将结果保存到 excel 文件中以导出它们。

The expected result of the excel file would include: excel 文件的预期结果将包括:

  1. column of the name of the state, state 的名称列,
  2. column of the name of the schools (10 schools for each state column),学校名称栏(每个 state 栏 10 所学校),
  3. column of the type of the school (to indicate 'public' or 'private'),学校类型栏(表示“公立”或“私立”),
  4. number of students in each school,每所学校的学生人数,
  5. mean of item1, item1的平均值,
  6. mean of item2, and item2的平均值,和
  7. mean of item3. item3的平均值。
library(randomNames)

# example to demonstrate the general concept): 
ID = 1:50
states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
schools = randomNames::randomNames(50) ## 5 first last names separated by a space
type = rep(c("private", "public"),times = c(20,30))
item1 = rnorm(50, mean=25, sd=5)
item2 = rnorm(50, mean=30, sd=5)
item3 = rnorm(50, mean=15, sd=5)
df = data.frame(ID, states, schools, type, item1, item2, item3)

Then I need to save it into excel file to export each state separately using the following code:然后我需要将它保存到 excel 文件中,以使用以下代码分别导出每个 state:

# this below code works fine, I'm just adding it to explain the full concept. 

list_data <- split(df, df$states)
Map(openxlsx::write.xlsx, list_data, paste0(names(list_data), '.xlsx'))

THANK YOU VERY MUCH.非常感谢您。

You can do this with dplyr and tidyr packages:您可以使用dplyrtidyr包执行此操作:

library(dplyr)
library(tidyr)

df %>% 
  dplyr::group_by(states, schools, type) %>% 
  dplyr::summarize(across(tidyr::starts_with("item"), ~ mean(.)),
                   students = n()) %>%
  dplyr::ungroup()

   states schools             type   item1 item2 item3 students
   <chr>  <chr>               <chr>  <dbl> <dbl> <dbl>    <int>
 1 AR     al-Hosein, Zubaida  public  23.4  35.1 15.4         1
 2 AR     al-Mohamed, Raadiya public  24.5  30.8 13.5         1
 3 AR     Bluford, Sage       public  29.9  32.4  9.49        1
 4 AR     Covarrubias, Julio  public  19.8  27.8 15.2         1
 5 AR     el-Gad, Naaila      public  27.0  33.5 19.5         1
 6 AR     el-Mansour, Fawzia  public  34.4  25.4 17.9         1
 7 AR     el-Sadri,  Sakeena  public  24.7  30.5 13.9         1
 8 AR     Ewers, Benjamin     public  18.3  33.6 13.5         1
 9 AR     Rivas, Joel         public  16.8  25.1 20.5         1
10 AR     Wilson, Reneisha    public  28.9  28.5 18.5         1
# ... with 40 more rows

If you have other column names that start with item then you can change the line across(tidyr::starts_with(.... to item1 = mean(item1) and so on.如果您有其他以item开头的列名,那么您可以将行更改为across(tidyr::starts_with(....item1 = mean(item1)等等。

The count for student assumes that each row within a school and state is a student and that the type does not change for a given school.学生的计数假定学校和 state 中的每一行都是学生,并且给定学校的类型不会改变。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM