来自 plyr package 的分类变量的平均值

Question

My categorical variable, risk has three groups in it of: ADV, HHM and POV我的分类变量，风险包含三组：ADV、HHM 和 POV

I want get the mean these three groups for four continuous variables read.5 , read.6 , read.7 and read.8 which are reading scores of individuals over grades 5 to 8我想得到这三个组的四个连续变量read.5 、 read.6 、 read.7和read.8的平均值，它们是 5 到 8 年级个人的阅读分数

which is the ,2:5 of my dataset and it's an old textbook example.这是我数据集的,2:5 ，它是一个旧的教科书示例。 I used the code below which is not correct apparently even though it is supposed to be correct according to the texbook example:我使用了下面的代码，尽管根据 texbook 示例它应该是正确的，但它显然是不正确的：

myrisk <- ddply(.data = MPLS[ ,2:5], .variables = .(MPLS$risk),
                .fun = mean, na.rm = TRUE)

I had an error message for a piece of code earlier on of:我之前收到了一段代码的错误消息：

mymeans <- mean(MPLS[ ,2:5], na.rm = TRUE)

which when I googled it, the R software had changed and I had to find another to work out the means.当我用谷歌搜索它时，R 软件已经改变，我必须找到另一个来解决方法。

My questions are:我的问题是：

Is the ddply function which I am trying to use currently, from the plyr package been superseded in the same way that the old mean function has?我目前正在尝试使用的 ddply function 从 plyr package 是否以与旧平均值 ZC1C425268E68385D14AB5074C17A9 相同的方式被取代？
How do I get the mean of a categorical variable from the four columns?如何从四列中获取分类变量的平均值？ Whether with the same function or with something different?是否使用相同的 function 或不同的东西？

Thank you谢谢

Answer 1

Hi you can use dplyr - its more up to date.嗨，您可以使用dplyr - 它是最新的。

 df<-data.frame(risk= rep(c("ADV","HHM","POV"),10),
                read.5= rnorm(30,30),
                read.4= rnorm(30,30),
                read.3= rnorm(30,30),
                read.2= rnorm(30,30))
> head(df)
#  risk   read.5   read.4   read.3   read.2
#1  ADV 30.78281 30.00721 29.80906 29.25936
#2  HHM 29.76175 29.63864 29.39256 29.40070
#3  POV 29.00964 30.48258 29.20662 28.77509
#4  ADV 29.60631 30.35032 32.00376 30.70374
#5  HHM 31.38653 30.28896 29.48756 30.32430
#6  POV 30.33102 30.40897 29.55796 30.10585

library(dplyr)

df %>% group_by(risk) %>% summarise_all(mean)

# A tibble: 3 x 5
#  risk  read.5 read.4 read.3 read.2
#  <fct>  <dbl>  <dbl>  <dbl>  <dbl>
1 ADV     30.3   30.2   30.2   30.4
2 HHM     29.7   30.5   29.8   29.9
3 POV     29.3   30.2   29.9   30.2

来自 plyr package 的分类变量的平均值

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-04-22 15:56:28

Hi you can use dplyr - its more up to date.嗨，您可以使用dplyr - 它是最新的。

来自 plyr package 的分类变量的平均值

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-04-22 15:56:28

Hi you can use dplyr - its more up to date.嗨，您可以使用dplyr - 它是最新的。

解决方案1
0 已采纳 2020-04-22 15:56:28