简体   繁体   中英

Finding outliers further than certain standard deviations from mean for a data frame in r

I understand that to find rows in a data frame that meet certain criteria (ie. filtering data) I would use a code similar to:

s[(s$age < 20 | s$age > 40)]

But would I go about trying to find the outlier rows that have 'age' values + or - 1 standard deviation from the mean?

s <- data.frame(
  sample = c("s_1", "s_2", "s_3", "s_4", "s_5", "s_6", "s_7", "s_8"),
  flavor = c("original", "chicken", "original", "original", "cheese", "chicken", "cheese", "original"),
age = c(23, 25, 11, 5, 6, 44, 50, 2),
  scale = c( 4, 3, 2, 5, 4, 3, 1, 5)) 

If you want to remove the outliers based on the initial statistics, it's straightforward:

s[(s$age < mean(s$age) - sd(s$age) | s$age > mean(s$age) + sd(s$age),]

This uses the base function sd . Also since you stated you want to select rows of a data.frame , I added a , to the indexing so it will return all columns.

If you want a continuous, filtering-like approach, you can use the apply - family functionality as mentioned by @Sotos

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM