简体   繁体   English

R:使用“ identify”在箱线图中找到列名

[英]R: use “identify” to find the column names in a boxplot

In R, I'm drawing a rather large boxplot from a data.frame with approximately 150 columns. 在R中,我从大约150列的data.frame中绘制了一个很大的箱线图。 I know that there are some "anomalous" columns where the distribution is too different from the rest of the data set and I want to identify which ones precisely. 我知道有些“异常”列的分布与其余数据集相差太大,因此我想精确地确定哪些列。

Rather unsurprisingly, there is not enough room for the labels and even if there were, it would be probably inconvenient to check by hand. 毫不奇怪,标签没有足够的空间,即使有,也可能不方便手工检查。 So I thought I could use R's identify function to locate the offending columns. 所以我认为我可以使用R的identify函数来定位有问题的列。 Such a function however needs x and y coordinates, and so far I was unable to get it to work. 但是,这样的函数需要x和y坐标,到目前为止,我无法使其正常工作。

I tried 我试过了

boxplot(dd.noctr$TGS, outline=F)
identify(xy.coords(dd.noctr$TGS)$x, y=xy.coords(dd.noctr$TGS)$y)

where dd.noctr$TGS is my data (a matrix or data.frame), only to get the error 其中dd.noctr$TGS是我的数据(矩阵或data.frame),仅用于获取错误

warning: no point within 0.25 inches

meaning that no point was identified. 这意味着没有发现任何要点。

Is there an alternative solution to identify column names (not single points)? 是否存在识别列名 (不是单点)的替代解决方案?

This solution seems a bit clunky, so there is probably a better solution. 该解决方案似乎有些笨拙,因此可能有更好的解决方案。

  1. Set up some example data with three columns: 用三列设置一些示例数据:

     TGS = data.frame(A = rnorm(100), B = rnorm(100), C=rnorm(100)) 
  2. Next plot the boxplot 接下来绘制箱线图

     boxplot(TGS, outline=F) 
  3. Now we construct the identity function. 现在我们构造identity函数。

     identify(x=rep(1:ncol(TGS), each=nrow(TGS)), y=as.vector(unlist(TGS)), label=rep(colnames(TGS), each=nrow(TGS))) 

    The labels are the column names. 标签是列名。 This function only works if you click near the centre of the boxplot. 仅在单击箱线图中心附近时,此功能才起作用。

在此处输入图片说明

If you want to get a list of outliers, you can use the 'out' component of boxplot. 如果要获取异常值列表,可以使用boxplot的“ out”组件。

example: Create a dataframe : with a few random values with mean 20, and add some outliers. 示例:创建一个数据框:包含一些随机值,平均值为20,并添加一些离群值。 This code will display the outliers. 此代码将显示异常值。

 df1 = data.frame(A = c(rnorm(15,20,3),7,8,35,32))   #15 rnorm and 4 extreme values
 bplot=boxplot(df1)
 bplot$out

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM