简体   繁体   English

R 计算子集中所有子组的均方

[英]R calculate mean square for all sub groups in a subset

how do I calculate the mean square of all 2019_Preston_STD,2019_Preston_V1,2019_Preston_V2 etc using the Value column, then the adjmth1, adjmth3 columns我如何使用值列计算所有 2019_Preston_STD、2019_Preston_V1、2019_Preston_V2 等的均方,然后是 adjmth1、adjmth3 列

structure(list(IDX = c("2019_Preston_STD", "2019_Preston_V1", 
"2019_Preston_V2", "2019_Preston_V3", "2019_Preston_W1", "2019_Preston_W2"
), Value = c(3L, 2L, 3L, 2L, 3L, 5L), adjmth1 = c(2.87777777777778, 
1.85555555555556, 2.01111111111111, 1.77777777777778, 3.62222222222222, 
4.45555555555556), adjmth3 = c(2.9328763348507, 2.08651828334684, 
2.80282946626847, 2.15028039284054, 2.68766916156347, 4.51425274916654
), adjmth13 = c(2.81065411262847, 1.82585524933201, 1.81394057737959, 
1.40785681078568, 3.30989138378569, 4.7301083495049)), row.names = 29:34, class = "data.frame")

This task can be done in many ways, as shown in the link that @r2evans pointed out.此任务可以通过多种方式完成,如@r2evans 指出的链接所示。 My favorite one is dplyr using summarize(across() because to me its syntax is easy to understand and easy to apply to many columns. It also presents the resulted numbers in nice format.我最喜欢的是dplyr使用summarize(across()因为对我来说它的语法很容易理解并且很容易应用于许多列。它还以漂亮的格式显示结果数字。

For example, from iris data I want to get the arithmetic mean of Sepal.Length , Petal.Length , and Petal.Width for each of species: setosa, versicolor, and virginica.例如,我想从iris数据中获取每个物种的Sepal.LengthPetal.LengthPetal.Width的算术mean :setosa、versicolor 和 virginica。 Here is the head of the data:这是数据的头部:

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

And here is how to get the mean in each species:以下是如何获得每个物种的平均值:

iris %>% group_by(Species) %>% 
         summarize(across(c(Sepal.Length, Petal.Length, Petal.Width), mean))
# A tibble: 3 x 4
# Species    Sepal.Length Petal.Length Petal.Width
# <fct>             <dbl>        <dbl>       <dbl>
# 1 setosa             5.01         1.46       0.246
# 2 versicolor         5.94         4.26       1.33 
# 3 virginica          6.59         5.55       2.03 

As for your task, first you need to define the function for the mean square (because its definition slightly varies in some references).至于你的任务,首先你需要为方定义 function (因为它的定义在某些参考文献中略有不同)。 Then, you apply it to your data frame using summarize(across()) .然后,您使用summarize(across())将它应用于您的数据框。

For example, you define the mean square function as follows:例如,您定义均方 function 如下:

meansq <- function(x) sum((x-mean(x))^2)/(length(x)-1)

Note: This definition requires that length(x) doesn't equal 1, or otherwise NaN will be produced.注意:此定义要求 length(x) 不等于 1,否则将产生 NaN。

You can apply it to your data frame newdata as follows:您可以将其应用于您的数据框newdata ,如下所示:

newdata %>% group_by(IDX) %>% 
            summarize(across(c(Value, adjmth1, adjmth3), meansq)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM