[英]How to get data from adjusted quantile plot in R?
我有一個包含兩列的data.frame並使用aq.plot
包中的mvoutlier
來識別我的二維數據集中的潛在異常值。 唯一的問題是我對生成的圖表的“外觀”不太滿意,並且想要獲取他們正在繪制的數據並在其他軟件中繪制圖表。
對於我的具體情況,情節是由,
library('mvoutlier')
data = read.csv(fp, colClasses=c("NULL",NA,NA))
h = aq.plot(data)
data.frame, data
如下所示:
pr tas
1 5.133207 59.24362
2 20.173075 75.81661
3 24.819054 97.31020
4 35.893467 92.11203
5 27.752425 95.70120
6 25.765618 91.14163
7 20.895360 57.30519
8 8.921513 70.31467
9 36.031261 98.24573
10 27.166213 92.79554
11 8.889431 54.48514
12 59.564447 85.69632
13 43.818336 99.36451
14 43.408963 84.23207
15 22.653269 84.89939
16 21.480331 96.18303
17 22.827370 69.97202
18 23.252464 85.08739
19 14.618731 45.30504
20 40.795519 78.56758
21 37.310456 80.30799
22 31.099105 91.31675
23 33.107472 63.07043
24 9.611930 35.62702
生成的圖如下所示:
所以我的問題是,如何在右上角的子圖中繪制信息? 根據信息,我指的是與每個點相關的x,y坐標和數字。 如果有辦法獲得繪制兩條垂直線的x值,那也會很棒。
我看到調用aq.plot()
命令的輸出h
給出了一個布爾數組,說明哪些點是異常值(TRUE)或不是(FALSE),但似乎沒有訪問繪圖的底層組件。
任何幫助將非常感激。
它全部在aq.plot
的代碼中。 以下是右上角繪圖的具體代碼:
plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance",
ylab = "Cumulative probability", type = "n")
text(s$x, (1:length(dist))/length(dist), as.character(s$ix),
col = 3, cex = 0.8)
t <- seq(0, max(dist), by = 0.01)
lines(t, pchisq(t, df = ncol(x)), col = 6)
abline(v = delta, col = 5)
text(x = delta, y = 0.4, paste(100 * (pchisq(delta, df = ncol(x))),
"% Quantile", sep = ""), col = 5, pos = 2, srt = 90,
cex = 0.8)
xarw <- arw(x, covr$center, covr$cov, alpha = alpha)
if (xarw$cn < Inf) {
abline(v = xarw$cn, col = 4)
text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4,
pos = 4, srt = 90, cex = 0.8)
}
如果你查看函數aq.plot
的代碼,你會發現你可以通過這種方式得到x坐標和相關的觀察:
covr <- robustbase::covMcd(data, alpha = 1/2)
dist <- mahalanobis(data, center = covr$center, cov = covr$cov)
s <- sort(dist, index = TRUE)
s$x
# 22 4 6 10 21 18 15 5 14
# 0.1152036 0.2181437 0.3148553 0.3255492 0.3752751 0.4076276 0.4661830 0.5299942 0.7093746
# 9 20 3 16 2 13 17 23 12
# 0.7564636 0.7756129 0.8838616 1.0807574 1.3059546 1.4891242 1.8606975 2.9690980 3.9152682
# 8 7 1 11 19
# 4.0283820 5.0767176 7.4233298 7.9488595 10.3217389
然后y坐標:
(1:length(dist))/length(dist)
#[1] 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130 0.26086957 0.30434783 0.34782609
#[9] 0.39130435 0.43478261 0.47826087 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217
#[17] 0.73913043 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000
您可以使用以上從上面更改的代碼直接重建該繪圖。 閱讀此代碼並在構建繪圖時跟隨,應該可以幫助您查看在何處查找每條信息。 看看垂直線上的abline
調用信息,你會在這里找到值qchisq(0.975, df = ncol(data))
和這里的arw(data, covr$center, covr$cov, alpha = 0.05)$cn
plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance",
ylab = "Cumulative probability", type = "n")
text(s$x, (1:length(dist))/length(dist), as.character(s$ix),
col = 3, cex = 0.8)
t <- seq(0, max(dist), by = 0.01)
lines(t, pchisq(t, df = ncol(data)), col = 6)
abline(v = qchisq(0.975, df = ncol(data)), col = 5)
text(x = qchisq(0.975, df = ncol(data)),
y = 0.4, paste(100 * (pchisq(qchisq(0.975, df = ncol(data)), df = ncol(data))),
"% Quantile", sep = ""), col = 5, pos = 2, srt = 90,
cex = 0.8)
xarw <- arw(data, covr$center, covr$cov, alpha = 0.05)
if (xarw$cn < Inf) {
abline(v = xarw$cn, col = 4)
text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4,
pos = 4, srt = 90, cex = 0.8)
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.