In R, I'm drawing a rather large boxplot from a data.frame with approximately 150 columns. I know that there are some "anomalous" columns where the distribution is too different from the rest of the data set and I want to identify which ones precisely.
Rather unsurprisingly, there is not enough room for the labels and even if there were, it would be probably inconvenient to check by hand. So I thought I could use R's identify
function to locate the offending columns. Such a function however needs x and y coordinates, and so far I was unable to get it to work.
I tried
boxplot(dd.noctr$TGS, outline=F)
identify(xy.coords(dd.noctr$TGS)$x, y=xy.coords(dd.noctr$TGS)$y)
where dd.noctr$TGS
is my data (a matrix or data.frame), only to get the error
warning: no point within 0.25 inches
meaning that no point was identified.
Is there an alternative solution to identify column names (not single points)?
This solution seems a bit clunky, so there is probably a better solution.
Set up some example data with three columns:
TGS = data.frame(A = rnorm(100), B = rnorm(100), C=rnorm(100))
Next plot the boxplot
boxplot(TGS, outline=F)
Now we construct the identity
function.
identify(x=rep(1:ncol(TGS), each=nrow(TGS)), y=as.vector(unlist(TGS)), label=rep(colnames(TGS), each=nrow(TGS)))
The labels are the column names. This function only works if you click near the centre of the boxplot.
If you want to get a list of outliers, you can use the 'out' component of boxplot.
example: Create a dataframe : with a few random values with mean 20, and add some outliers. This code will display the outliers.
df1 = data.frame(A = c(rnorm(15,20,3),7,8,35,32)) #15 rnorm and 4 extreme values
bplot=boxplot(df1)
bplot$out
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.