简体   繁体   English

如何在R的数据框中的列中仅获取某些行的标准差?

[英]How can I take standard deviations of only certain rows within a column in a dataframe in R?

I have a data frame that contains 652 rows (products, specifically) and 3 particular columns of interest: stand_aov, stand_cr, and cluster_labels10. 我有一个数据框,其中包含652行(具体是产品)和3个特定的关注列:stand_aov,stand_cr和cluster_labels10。 I'm interested in finding the standard deviation of stand_aov and stand_cr within each cluster_label10 and then exporting that to a simple data frame that just lists standard deviation for stand_aov and stand_cr by cluster_label10. 我有兴趣在每个cluster_label10中找到stand_aov和stand_cr的标准偏差,然后将其导出到一个简单的数据框,该数据框仅列出cluster_label10的stand_aov和stand_cr的标准偏差。 Every one of the 652 products falls into a cluster_labels10, and all are labeled from 1-10. 652种产品中的每一种都属于cluster_labels10,所有产品的标签均介于1-10之间。

Ideally the output would ultimately just contain 3 columns (cluster_labels10 IDs, the stdev of stand_aov for each cluster label, and stdev of stand_cr for each cluster) and 10 rows - 1 for each of the cluster_labels10. 理想情况下,输出最终将仅包含3列(cluster_labels10 ID,每个群集标签的stand_aov stdev和每个群集标签的stand_cr stdev)和10行-每个cluster_labels10 1。

Just to give an example of what the first row might look like: 仅举一个第一行的示例:

cluster_labels10 stdev_stand_aov stdev_stand_cr
               1            .001           .001

Base R solution using aggregate : 使用基础R溶液aggregate

set.seed(123)
df <- data.frame(
        cluster_labels10 = rep(c(1, 2, 3), each = 5),
        stand_aov = rnorm(15),
        stand_cr = rnorm(15)
)

aggregate(df[2:3], list(df$cluster_labels10), sd)
  Group.1 stand_aov  stand_cr
1       1 0.8110218 1.4110413
2       2 1.1634896 0.3445583
3       3 0.6394632 1.2619931

Perhaps you can use dplyr : 也许您可以使用dplyr

require(dplyr)
set.seed(123)
DF <- data.frame(
  cluster_labels10 = rep(c(1, 2, 3), each = 5),
  stand_aov = rnorm(15),
  stand_cr = rnorm(15)
)


> DF
   cluster_labels10   stand_aov   stand_cr
1                 1 -0.56047565  1.7869131
2                 1 -0.23017749  0.4978505
3                 1  1.55870831 -1.9666172
4                 1  0.07050839  0.7013559
5                 1  0.12928774 -0.4727914
6                 2  1.71506499 -1.0678237
7                 2  0.46091621 -0.2179749
8                 2 -1.26506123 -1.0260044
9                 2 -0.68685285 -0.7288912
10                2 -0.44566197 -0.6250393
11                3  1.22408180 -1.6866933
12                3  0.35981383  0.8377870
13                3  0.40077145  0.1533731
14                3  0.11068272 -1.1381369
15                3 -0.55584113  1.2538149

DF %>% 
  group_by(cluster_labels10) %>% 
  summarise(x = sd(stand_aov), y = sd(stand_cr))

Output: 输出:

Source: local data frame [3 x 3]

  cluster_labels10         x         y
1                1 0.8110218 1.4110413
2                2 1.1634896 0.3445583
3                3 0.6394632 1.2619931

Another base R approach using by : 使用另一种基础R方法by

# generate random data
dat <- data.frame(cluster_labels10 = sample(1:10, size = 652, replace = TRUE), 
                  stand_aov = rnorm(n = 652), stand_cr = rnorm(n = 652))

Use by to calculate stats by groups of observations based on cluster_labels10 使用by由观察组根据计算统计cluster_labels10

sd.1 <- by(data = dat$stand_aov, INDICES = dat$cluster_labels10, FUN = sd)
sd.2 <- by(data = dat$stand_cr, INDICES = dat$cluster_labels10, FUN = sd)
final <- cbind(cluster_labels10 = as.numeric(names(sd.1)), 
               stdev_stand_aov = sd.1, stdev_stand_cr = sd.2)

Results 结果

final

#    cluster_labels10 stdev_stand_aov stdev_stand_cr
# 1                 1       0.8785011      1.0402992
# 2                 2       1.0942536      1.3726442
# 3                 3       0.9294320      0.9795918
# 4                 4       1.1355244      1.1050766
# 5                 5       1.0023296      0.8770729
# 6                 6       1.1367627      0.9499932
# 7                 7       0.9796322      0.9257972
# 8                 8       0.9715574      1.0221725
# 9                 9       0.9044647      1.0052602
# 10               10       1.1215173      1.1609340

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算 R 中组的多个均值和标准差 - How I can calculate multiple means and standard deviations for groups in R 如何在 R 中仅制作 dataframe 的某些行的直方图 - How can I make a histogram of only certain rows of a dataframe in R 如何删除数据框中R中的某些行 - How to delete certain rows in R within dataframe 如何删除数据框每一列中的异常值(与平均值相差 3 个标准差的数字) - How can I remove outliers (numbers 3 standard deviations away from the mean) in each column of a data frame 如何在 R 工作室中的数据集中对单列但只有一定数量的行求和 - How can I sum a single column but only a certain number of rows in a dataset in R studio 如何将 dataframe 中的某些值替换为 r 中的列名? - How can I replace certain values in a dataframe with the column name in r? 如何计算R中多个标准差的平均值? - How to calculate the average of multiple standard deviations in R? 标准差之间/之内 - Between/within standard deviations R-如何仅更新与列包含某个数字开头的行有关的列值? - R- How can I update column values only pertaining to rows where column contains beginning of a certain number? R:运行标准偏差 - R: running standard deviations
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM