I understand that to find rows in a data frame that meet certain criteria (ie. filtering data) I would use a code similar to:
s[(s$age < 20 | s$age > 40)]
But would I go about trying to find the outlier rows that have 'age' values + or - 1 standard deviation from the mean?
s <- data.frame(
sample = c("s_1", "s_2", "s_3", "s_4", "s_5", "s_6", "s_7", "s_8"),
flavor = c("original", "chicken", "original", "original", "cheese", "chicken", "cheese", "original"),
age = c(23, 25, 11, 5, 6, 44, 50, 2),
scale = c( 4, 3, 2, 5, 4, 3, 1, 5))
If you want to remove the outliers based on the initial statistics, it's straightforward:
s[(s$age < mean(s$age) - sd(s$age) | s$age > mean(s$age) + sd(s$age),]
This uses the base function sd
. Also since you stated you want to select rows of a data.frame
, I added a ,
to the indexing so it will return all columns.
If you want a continuous, filtering-like approach, you can use the apply
- family functionality as mentioned by @Sotos
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.