[英]How to calculate means for nested groups and count number of observations in R
all (R users), THANK YOU VERY MUCH in advance.所有(R 用户),非常感谢您提前。 I've a data set that contains students' scores from multiple states.
我有一个数据集,其中包含来自多个州的学生分数。 Each state has different schools (10 schools in this example), each school has to be either 'public' or 'private';
每个 state 都有不同的学校(本例中为 10 所学校),每所学校必须是“公立”或“私立”; and three items' test scores.
和三个项目的考试成绩。 I need to calculate the mean of each school for each item, and display the type of school, then save the results into excel file to export them.
我需要为每个项目计算每个学校的平均值,并显示学校的类型,然后将结果保存到 excel 文件中以导出它们。
The expected result of the excel file would include: excel 文件的预期结果将包括:
library(randomNames)
# example to demonstrate the general concept):
ID = 1:50
states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
schools = randomNames::randomNames(50) ## 5 first last names separated by a space
type = rep(c("private", "public"),times = c(20,30))
item1 = rnorm(50, mean=25, sd=5)
item2 = rnorm(50, mean=30, sd=5)
item3 = rnorm(50, mean=15, sd=5)
df = data.frame(ID, states, schools, type, item1, item2, item3)
Then I need to save it into excel file to export each state separately using the following code:然后我需要将它保存到 excel 文件中,以使用以下代码分别导出每个 state:
# this below code works fine, I'm just adding it to explain the full concept.
list_data <- split(df, df$states)
Map(openxlsx::write.xlsx, list_data, paste0(names(list_data), '.xlsx'))
THANK YOU VERY MUCH.非常感谢您。
You can do this with dplyr
and tidyr
packages:您可以使用
dplyr
和tidyr
包执行此操作:
library(dplyr)
library(tidyr)
df %>%
dplyr::group_by(states, schools, type) %>%
dplyr::summarize(across(tidyr::starts_with("item"), ~ mean(.)),
students = n()) %>%
dplyr::ungroup()
states schools type item1 item2 item3 students
<chr> <chr> <chr> <dbl> <dbl> <dbl> <int>
1 AR al-Hosein, Zubaida public 23.4 35.1 15.4 1
2 AR al-Mohamed, Raadiya public 24.5 30.8 13.5 1
3 AR Bluford, Sage public 29.9 32.4 9.49 1
4 AR Covarrubias, Julio public 19.8 27.8 15.2 1
5 AR el-Gad, Naaila public 27.0 33.5 19.5 1
6 AR el-Mansour, Fawzia public 34.4 25.4 17.9 1
7 AR el-Sadri, Sakeena public 24.7 30.5 13.9 1
8 AR Ewers, Benjamin public 18.3 33.6 13.5 1
9 AR Rivas, Joel public 16.8 25.1 20.5 1
10 AR Wilson, Reneisha public 28.9 28.5 18.5 1
# ... with 40 more rows
If you have other column names that start with item
then you can change the line across(tidyr::starts_with(....
to item1 = mean(item1)
and so on.如果您有其他以
item
开头的列名,那么您可以将行更改为across(tidyr::starts_with(....
到item1 = mean(item1)
等等。
The count for student assumes that each row within a school and state is a student and that the type does not change for a given school.学生的计数假定学校和 state 中的每一行都是学生,并且给定学校的类型不会改变。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.