簡體   English   中英

如何從R中調整的分位數圖中獲取數據?

[英]How to get data from adjusted quantile plot in R?

我有一個包含兩列的data.frame並使用aq.plot包中的mvoutlier來識別我的二維數據集中的潛在異常值。 唯一的問題是我對生成的圖表的“外觀”不太滿意,並且想要獲取他們正在繪制的數據並在其他軟件中繪制圖表。

對於我的具體情況,情節是由,

library('mvoutlier')

data = read.csv(fp, colClasses=c("NULL",NA,NA))

h = aq.plot(data)

data.frame, data如下所示:

    pr          tas
1   5.133207    59.24362
2   20.173075   75.81661
3   24.819054   97.31020
4   35.893467   92.11203
5   27.752425   95.70120
6   25.765618   91.14163
7   20.895360   57.30519
8   8.921513    70.31467
9   36.031261   98.24573
10  27.166213   92.79554
11  8.889431    54.48514
12  59.564447   85.69632
13  43.818336   99.36451
14  43.408963   84.23207
15  22.653269   84.89939
16  21.480331   96.18303
17  22.827370   69.97202
18  23.252464   85.08739
19  14.618731   45.30504
20  40.795519   78.56758
21  37.310456   80.30799
22  31.099105   91.31675
23  33.107472   63.07043
24  9.611930    35.62702

生成的圖如下所示:

在此輸入圖像描述

所以我的問題是,如何在右上角的子圖中繪制信息? 根據信息,我指的是與每個點相關的x,y坐標和數字。 如果有辦法獲得繪制兩條垂直線的x值,那也會很棒。

我看到調用aq.plot()命令的輸出h給出了一個布爾數組,說明哪些點是異常值(TRUE)或不是(FALSE),但似乎沒有訪問繪圖的底層組件。

任何幫助將非常感激。

它全部在aq.plot的代碼中。 以下是右上角繪圖的具體代碼:

plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance", 
        ylab = "Cumulative probability", type = "n")
    text(s$x, (1:length(dist))/length(dist), as.character(s$ix), 
        col = 3, cex = 0.8)
    t <- seq(0, max(dist), by = 0.01)
    lines(t, pchisq(t, df = ncol(x)), col = 6)
    abline(v = delta, col = 5)
    text(x = delta, y = 0.4, paste(100 * (pchisq(delta, df = ncol(x))), 
        "% Quantile", sep = ""), col = 5, pos = 2, srt = 90, 
        cex = 0.8)
    xarw <- arw(x, covr$center, covr$cov, alpha = alpha)
    if (xarw$cn < Inf) {
        abline(v = xarw$cn, col = 4)
        text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4, 
            pos = 4, srt = 90, cex = 0.8)
    }

如果你查看函數aq.plot的代碼,你會發現你可以通過這種方式得到x坐標和相關的觀察:

covr <- robustbase::covMcd(data, alpha = 1/2)
dist <- mahalanobis(data, center = covr$center, cov = covr$cov)
s <- sort(dist, index = TRUE)
s$x 
#        22          4          6         10         21         18         15          5         14 
# 0.1152036  0.2181437  0.3148553  0.3255492  0.3752751  0.4076276  0.4661830  0.5299942  0.7093746 
#         9         20          3         16          2         13         17         23         12 
# 0.7564636  0.7756129  0.8838616  1.0807574  1.3059546  1.4891242  1.8606975  2.9690980  3.9152682 
#         8          7          1         11         19 
# 4.0283820  5.0767176  7.4233298  7.9488595 10.3217389 

然后y坐標:

(1:length(dist))/length(dist)
#[1] 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130 0.26086957 0.30434783 0.34782609
#[9] 0.39130435 0.43478261 0.47826087 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217
#[17] 0.73913043 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000

您可以使用以上從上面更改的代碼直接重建該繪圖。 閱讀此代碼並在構建繪圖時跟隨,應該可以幫助您查看在何處查找每條信息。 看看垂直線上的abline調用信息,你會在這里找到值qchisq(0.975, df = ncol(data))和這里的arw(data, covr$center, covr$cov, alpha = 0.05)$cn

 plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance", 
        ylab = "Cumulative probability", type = "n")
    text(s$x, (1:length(dist))/length(dist), as.character(s$ix), 
        col = 3, cex = 0.8)
    t <- seq(0, max(dist), by = 0.01)
    lines(t, pchisq(t, df = ncol(data)), col = 6)
    abline(v = qchisq(0.975, df = ncol(data)), col = 5)
    text(x = qchisq(0.975, df = ncol(data)), 
         y = 0.4, paste(100 * (pchisq(qchisq(0.975, df = ncol(data)), df = ncol(data))), 
        "% Quantile", sep = ""), col = 5, pos = 2, srt = 90, 
        cex = 0.8)
    xarw <- arw(data, covr$center, covr$cov, alpha = 0.05)
    if (xarw$cn < Inf) {
        abline(v = xarw$cn, col = 4)
        text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4, 
            pos = 4, srt = 90, cex = 0.8)
    }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM