简体   繁体   中英

Multivariate robust outlier detection using R

What is the preferred way (in your opinion) to perform multivariate robust outlier detection in R in an automatic way, ie without manual inspection and plotting?

I have found the "dprep" package, but it seems discontinued. However, as outlier detection is a frequent and important task, a generic default method should be available, eg the MCD estimator (Rousseeuw and Van Driesen, 1999).

在软件包robustbase中尝试covMcd。

Use Cook's Distance 在此处输入图片说明 You could use cook's distance. Cook's distance is computed based on a linear regression model. That means, you will be able to include multiple X variables to compute the outlier (high influence observations, more precisely). This effectively gives you the flexibility to add or drop the variables on which you would want to determine the outliers. The way to compute it for every observation in R would look something like this:

mod <- lm(Y ~ X1 + X2 + X3, data=inputData)
cooksd <- cooks.distance(mod)

In general convention, those observations with a cook's distance > 4*mean(cooks distance) are considered outliers. For more information about the formula and interpretation of cook's distance refer to this example

Disclaimer: I am the author.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM