Car 100 200 300
Group1 34 35 34
Group1 57 67 34
Group1 68 76 6
Group2 45 23 23
I have some problems while detecting outliers in my dataframe. I want to detect if there is a complete vector (one row) an outlier of the corresponding group vectors (rows one-three)for each group. Further i want to detect if there is an outlier in one specific row. For this problem i found this solution but with this code i have to repeat the whole code for every single row and check the table for an "TRUE". Is there an outomatisation possible? eg creating a matrix of all outputs so i just have to check >sum(matrix==TRUE)
The code:
x=as.numeric(data_without[1,1:400])
grubbs.flag <- function(x) {
outliers <- NULL
test <- x
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
while(pv < 0.05) {
outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
test <- x[!x %in% outliers]
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
}
return(data.frame(X=x,Outlier=(x %in% outliers)))
}
grubbs.flag(x)
X Outlier
1 0.1157 FALSE
2 0.1152 FALSE
3 0.1163 FALSE
4 0.1165 FALSE
I've read the object documentation and the default option just checks if there is a single outlier given data. Therefore I consider it suffices to run the test only once per each group.
First the data is split by group and then test is done recursively for each group. Only p-value and description is returned at the end to see which is the outlier if any - it'd be easy to identify which is the outlier as it'll be either the maximum or minimum value.
library(outliers)
df <- t(data.frame(car = c(100,200,300),
g1 = c(34,35,34),
g1 = c(57,67,34),
g1 = c(68, 76, 6),
g2 = c(45, 23, 23)))
row.names(df) <- c("car", "group1", "group1", "group1", "group2")
lst <- lapply(1:length(unique(row.names(df))), function(x) {
df[row.names(df)==unique(row.names(df))[x],]
})
lst
[[1]]
[1] 100 200 300
[[2]]
[,1] [,2] [,3]
group1 34 35 34
group1 57 67 34
group1 68 76 6
[[3]]
[1] 45 23 23
lapply(lst, function(x) {
tst <- grubbs.test(x)
c(tst$p.value, tst$alternative)
})
[[1]]
[1] "0.5" "highest value 300 is an outlier"
[[2]]
[1] "0.244875529263511" "lowest value 6 is an outlier"
[[3]]
[1] "0" "highest value 45 is an outlier"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.