简体   繁体   English

如何从R中调整的分位数图中获取数据?

[英]How to get data from adjusted quantile plot in R?

I have a data.frame with two columns and am using aq.plot from the mvoutlier package, to identify potential outliers in my 2-D dataset. 我有一个包含两列的data.frame并使用aq.plot包中的mvoutlier来识别我的二维数据集中的潜在异常值。 The only problem is I'm not very happy with the "look" of the plots generated and would like to grab the data they are plotting and make the plot in some other software. 唯一的问题是我对生成的图表的“外观”不太满意,并且想要获取他们正在绘制的数据并在其他软件中绘制图表。

For my specific case the plot is generated by, 对于我的具体情况,情节是由,

library('mvoutlier')

data = read.csv(fp, colClasses=c("NULL",NA,NA))

h = aq.plot(data)

the data.frame, data looks like this: data.frame, data如下所示:

    pr          tas
1   5.133207    59.24362
2   20.173075   75.81661
3   24.819054   97.31020
4   35.893467   92.11203
5   27.752425   95.70120
6   25.765618   91.14163
7   20.895360   57.30519
8   8.921513    70.31467
9   36.031261   98.24573
10  27.166213   92.79554
11  8.889431    54.48514
12  59.564447   85.69632
13  43.818336   99.36451
14  43.408963   84.23207
15  22.653269   84.89939
16  21.480331   96.18303
17  22.827370   69.97202
18  23.252464   85.08739
19  14.618731   45.30504
20  40.795519   78.56758
21  37.310456   80.30799
22  31.099105   91.31675
23  33.107472   63.07043
24  9.611930    35.62702

And the generated plot looks like this: 生成的图如下所示:

在此输入图像描述

So my question is, how can I get the information plotted in the subplot on the top right? 所以我的问题是,如何在右上角的子图中绘制信息? By information I mean the x, y coordinates and number associated with each point. 根据信息,我指的是与每个点相关的x,y坐标和数字。 It would also be great if there were a way to get the x values at which the two vertical lines are drawn. 如果有办法获得绘制两条垂直线的x值,那也会很棒。

I see that the output h from calling the aq.plot() command gives a boolean array stating which points are outliers (TRUE) or not (FALSE) but there appear to be no access to the underlying components of the plot. 我看到调用aq.plot()命令的输出h给出了一个布尔数组,说明哪些点是异常值(TRUE)或不是(FALSE),但似乎没有访问绘图的底层组件。

Any help would be much appreciated. 任何帮助将非常感激。

It's all in the code for aq.plot . 它全部在aq.plot的代码中。 Here is the specific code for the plot in the upper right: 以下是右上角绘图的具体代码:

plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance", 
        ylab = "Cumulative probability", type = "n")
    text(s$x, (1:length(dist))/length(dist), as.character(s$ix), 
        col = 3, cex = 0.8)
    t <- seq(0, max(dist), by = 0.01)
    lines(t, pchisq(t, df = ncol(x)), col = 6)
    abline(v = delta, col = 5)
    text(x = delta, y = 0.4, paste(100 * (pchisq(delta, df = ncol(x))), 
        "% Quantile", sep = ""), col = 5, pos = 2, srt = 90, 
        cex = 0.8)
    xarw <- arw(x, covr$center, covr$cov, alpha = alpha)
    if (xarw$cn < Inf) {
        abline(v = xarw$cn, col = 4)
        text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4, 
            pos = 4, srt = 90, cex = 0.8)
    }

If you look through the code of the function aq.plot , you will see that you can get the x coordinates and the associated observation this way: 如果你查看函数aq.plot的代码,你会发现你可以通过这种方式得到x坐标和相关的观察:

covr <- robustbase::covMcd(data, alpha = 1/2)
dist <- mahalanobis(data, center = covr$center, cov = covr$cov)
s <- sort(dist, index = TRUE)
s$x 
#        22          4          6         10         21         18         15          5         14 
# 0.1152036  0.2181437  0.3148553  0.3255492  0.3752751  0.4076276  0.4661830  0.5299942  0.7093746 
#         9         20          3         16          2         13         17         23         12 
# 0.7564636  0.7756129  0.8838616  1.0807574  1.3059546  1.4891242  1.8606975  2.9690980  3.9152682 
#         8          7          1         11         19 
# 4.0283820  5.0767176  7.4233298  7.9488595 10.3217389 

Then the y coordinates: 然后y坐标:

(1:length(dist))/length(dist)
#[1] 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130 0.26086957 0.30434783 0.34782609
#[9] 0.39130435 0.43478261 0.47826087 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217
#[17] 0.73913043 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000

You can rebuild that plot directly using the following code altered from above. 您可以使用以上从上面更改的代码直接重建该绘图。 Reading through this code and following along as you build the plot should help you see where to find each piece of information. 阅读此代码并在构建绘图时跟随,应该可以帮助您查看在何处查找每条信息。 Look to the abline calls for info on the vertical lines, and you'll find the values here qchisq(0.975, df = ncol(data)) and here arw(data, covr$center, covr$cov, alpha = 0.05)$cn 看看垂直线上的abline调用信息,你会在这里找到值qchisq(0.975, df = ncol(data))和这里的arw(data, covr$center, covr$cov, alpha = 0.05)$cn

 plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance", 
        ylab = "Cumulative probability", type = "n")
    text(s$x, (1:length(dist))/length(dist), as.character(s$ix), 
        col = 3, cex = 0.8)
    t <- seq(0, max(dist), by = 0.01)
    lines(t, pchisq(t, df = ncol(data)), col = 6)
    abline(v = qchisq(0.975, df = ncol(data)), col = 5)
    text(x = qchisq(0.975, df = ncol(data)), 
         y = 0.4, paste(100 * (pchisq(qchisq(0.975, df = ncol(data)), df = ncol(data))), 
        "% Quantile", sep = ""), col = 5, pos = 2, srt = 90, 
        cex = 0.8)
    xarw <- arw(data, covr$center, covr$cov, alpha = 0.05)
    if (xarw$cn < Inf) {
        abline(v = xarw$cn, col = 4)
        text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4, 
            pos = 4, srt = 90, cex = 0.8)
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM