[英]How can I take standard deviations of only certain rows within a column in a dataframe in R?
I have a data frame that contains 652 rows (products, specifically) and 3 particular columns of interest: stand_aov, stand_cr, and cluster_labels10. 我有一个数据框,其中包含652行(具体是产品)和3个特定的关注列:stand_aov,stand_cr和cluster_labels10。 I'm interested in finding the standard deviation of stand_aov and stand_cr within each cluster_label10 and then exporting that to a simple data frame that just lists standard deviation for stand_aov and stand_cr by cluster_label10.
我有兴趣在每个cluster_label10中找到stand_aov和stand_cr的标准偏差,然后将其导出到一个简单的数据框,该数据框仅列出cluster_label10的stand_aov和stand_cr的标准偏差。 Every one of the 652 products falls into a cluster_labels10, and all are labeled from 1-10.
652种产品中的每一种都属于cluster_labels10,所有产品的标签均介于1-10之间。
Ideally the output would ultimately just contain 3 columns (cluster_labels10 IDs, the stdev of stand_aov for each cluster label, and stdev of stand_cr for each cluster) and 10 rows - 1 for each of the cluster_labels10. 理想情况下,输出最终将仅包含3列(cluster_labels10 ID,每个群集标签的stand_aov stdev和每个群集标签的stand_cr stdev)和10行-每个cluster_labels10 1。
Just to give an example of what the first row might look like: 仅举一个第一行的示例:
cluster_labels10 stdev_stand_aov stdev_stand_cr
1 .001 .001
Base R solution using aggregate
: 使用基础R溶液
aggregate
:
set.seed(123)
df <- data.frame(
cluster_labels10 = rep(c(1, 2, 3), each = 5),
stand_aov = rnorm(15),
stand_cr = rnorm(15)
)
aggregate(df[2:3], list(df$cluster_labels10), sd)
Group.1 stand_aov stand_cr
1 1 0.8110218 1.4110413
2 2 1.1634896 0.3445583
3 3 0.6394632 1.2619931
Perhaps you can use dplyr
: 也许您可以使用
dplyr
:
require(dplyr)
set.seed(123)
DF <- data.frame(
cluster_labels10 = rep(c(1, 2, 3), each = 5),
stand_aov = rnorm(15),
stand_cr = rnorm(15)
)
> DF
cluster_labels10 stand_aov stand_cr
1 1 -0.56047565 1.7869131
2 1 -0.23017749 0.4978505
3 1 1.55870831 -1.9666172
4 1 0.07050839 0.7013559
5 1 0.12928774 -0.4727914
6 2 1.71506499 -1.0678237
7 2 0.46091621 -0.2179749
8 2 -1.26506123 -1.0260044
9 2 -0.68685285 -0.7288912
10 2 -0.44566197 -0.6250393
11 3 1.22408180 -1.6866933
12 3 0.35981383 0.8377870
13 3 0.40077145 0.1533731
14 3 0.11068272 -1.1381369
15 3 -0.55584113 1.2538149
DF %>%
group_by(cluster_labels10) %>%
summarise(x = sd(stand_aov), y = sd(stand_cr))
Output: 输出:
Source: local data frame [3 x 3]
cluster_labels10 x y
1 1 0.8110218 1.4110413
2 2 1.1634896 0.3445583
3 3 0.6394632 1.2619931
Another base R approach using by
: 使用另一种基础R方法
by
:
# generate random data
dat <- data.frame(cluster_labels10 = sample(1:10, size = 652, replace = TRUE),
stand_aov = rnorm(n = 652), stand_cr = rnorm(n = 652))
Use by
to calculate stats by groups of observations based on cluster_labels10
使用
by
由观察组根据计算统计cluster_labels10
sd.1 <- by(data = dat$stand_aov, INDICES = dat$cluster_labels10, FUN = sd)
sd.2 <- by(data = dat$stand_cr, INDICES = dat$cluster_labels10, FUN = sd)
final <- cbind(cluster_labels10 = as.numeric(names(sd.1)),
stdev_stand_aov = sd.1, stdev_stand_cr = sd.2)
Results 结果
final
# cluster_labels10 stdev_stand_aov stdev_stand_cr
# 1 1 0.8785011 1.0402992
# 2 2 1.0942536 1.3726442
# 3 3 0.9294320 0.9795918
# 4 4 1.1355244 1.1050766
# 5 5 1.0023296 0.8770729
# 6 6 1.1367627 0.9499932
# 7 7 0.9796322 0.9257972
# 8 8 0.9715574 1.0221725
# 9 9 0.9044647 1.0052602
# 10 10 1.1215173 1.1609340
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.