[英]How do I compare group means to individual observations and make a new TRUE/FALSE column?
I am new to R and this is my first post on SO - so please bear with me.我是 R 的新手,这是我在 SO 上的第一篇文章 - 所以请多多包涵。
I am trying to identify outliers in my dataset.我正在尝试识别数据集中的异常值。 I have two data.frames:我有两个data.frames:
(1 - original data set, 192 rows): observations and their value (AvgConc) (1 - 原始数据集,192 行):观察值及其值 (AvgConc)
(2 - created with dplyr, 24 rows): Group averages from the original data set, along with quantiles, minimum, and maximum values (2 - 使用 dplyr 创建,24 行):来自原始数据集的分组平均值,以及分位数、最小值和最大值
I want to create a new column within the original data set that gives TRUE/FALSE based on whether (AvgConc) is greater than the maximum or less than the minimum I have calculated in the second data.frame.我想在原始数据集中创建一个新列,根据 (AvgConc) 是大于最大值还是小于我在第二个 data.frame 中计算的最小值给出 TRUE/FALSE。 How do I go about doing this?我该怎么做呢?
Failed attempt:尝试失败:
Outliers <- Original.Data %>%
group_by(Status, Stim, Treatment) %>%
mutate(Outlier = Original.Data$AvgConc > Quantiles.Data$Maximum | Original.Data$AvgConc < Quantiles.Data$Minimum) %>%
as.data.frame()
Error: Column Outlier
must be length 8 (the group size) or one, not 192错误:列Outlier
的长度必须为 8(组大小)或 1,而不是 192
Here, we need to remove the Quantiles.Data$
by doing a join with 'Original.Data' by
the 'Status', 'Stim', 'Treatment'在这里,我们需要by
“Status”、“Stim”、“Treatment”与“Original.Data”进行连接来删除Quantiles.Data$
library(dplyr)
Original.Data %>%
inner_join(Quantiles.Data %>%
select(Status, Stim, Treatment, Maximum, Minimum)) %>%
group_by(Status, Stim, Treatment) %>%
mutate(Outlier = (AvgConc > Maximum) |(AvgConc < Minimum)) %>%
as.data.frame()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.