簡體   English   中英

使用 R 從 data.frame 中刪除異常值

[英]Remove outlier values from a data.frame using R

我有一個帶有水質值列的 data.frame。

我想從每列中刪除異常值,並添加 nodata NA 來代替值。

編輯:

我想刪除異常值如下:

異常值 > 分位數 95

異常值 < 分位數 5

在此處輸入圖片說明

我怎么能那樣做?

我有一個例子說明我的情況

df=read.table(text="st  PH  OD  COD N
A   7.3 1.26301094  1.112359589 0.295842925
B   12.69875867 5.670646078 4.841748321 0.096958426
C   9.613564343 1.706277385 7.952266541 0.102672152
D   9.693461149 7.075560183 0.283503075 0.302494648
A   11.2031501  5.444756127 3.133271063 0.421172108
B   9.288552402 4.169068095 10.54049312 0.122900615
C   4.207333379 6.717653051 10.49073885 0.085634135
D   10.98593946 2.352068972 8.468436777 0.142284793
A   8.20679887  7.826764274 4.464242367 0.211200956
B   12.9165421  0.909886436 1.488358471 0.001640961
C   3.971088246 8.500668307 6.315208679 0.319835127
D   4.821068685 3.871082236 8.669284239 0.349317325
A   0.431563127 0.978922921 10.53756208 0.111929377
B   7.546887828 9.946840115 1.584013576 0.426681716
C   4.689617182 8.717656795 7.474709944 0.473463497
D   9.730568456 1.134763618 4.679810195 0.215744107
A   12.06381259 6.862549062 0.559497593 0.231984105
", 
sep = "", header = TRUE)

使用applyquantiledplyr::na_if ,您可以:

df[-1] <- apply(df[-1], 2, as.numeric)
df[-1] <- apply(df[-1], 2,
                 function(x) na_if(x,x[which(x < quantile(x,probs=c(0.05)))]))
df[-1] <- apply(df[-1], 2,
                 function(x) na_if(x,x[which(x > quantile(x,probs=c(0.95),na.rm=T))]))
df

   st        PH        OD        COD          N
1   A  7.300000 1.2630109  1.1123596 0.29584292
2   B 12.698759 5.6706461  4.8417483 0.09695843
3   C  9.613564 1.7062774  7.9522665 0.10267215
4   D  9.693461 7.0755602         NA 0.30249465
5   A 11.203150 5.4447561  3.1332711 0.42117211
6   B  9.288552 4.1690681         NA 0.12290062
7   C  4.207333 6.7176531 10.4907388 0.08563414
8   D 10.985939 2.3520690  8.4684368 0.14228479
9   A  8.206799 7.8267643  4.4642424 0.21120096
10  B        NA        NA  1.4883585         NA
11  C  3.971088 8.5006683  6.3152087 0.31983513
12  D  4.821069 3.8710822  8.6692842 0.34931733
13  A        NA 0.9789229 10.5375621 0.11192938
14  B  7.546888        NA  1.5840136 0.42668172
15  C  4.689617 8.7176568  7.4747099         NA
16  D  9.730568 1.1347636  4.6798102 0.21574411
17  A 12.063813 6.8625491  0.5594976 0.23198410
rm_outlier <- function(x, lq=5/100, uq=95/100) {
qnts = quantile(x, probs=c(lq, uq))
ifelse(x < qnts[1] | x > qnts[2], NA, x)
}
do.call(cbind.data.frame, lapply(df[, -1], rm_outlier))
          PH        OD        COD          N
1   7.300000 1.2630109  1.1123596 0.29584292
2  12.698759 5.6706461  4.8417483 0.09695843
3   9.613564 1.7062774  7.9522665 0.10267215
4   9.693461 7.0755602         NA 0.30249465
5  11.203150 5.4447561  3.1332711 0.42117211
6   9.288552 4.1690681         NA 0.12290062
7   4.207333 6.7176531 10.4907388 0.08563414
8  10.985939 2.3520690  8.4684368 0.14228479
9   8.206799 7.8267643  4.4642424 0.21120096
10        NA        NA  1.4883585         NA
11  3.971088 8.5006683  6.3152087 0.31983513
12  4.821069 3.8710822  8.6692842 0.34931733
13        NA 0.9789229 10.5375621 0.11192938
14  7.546888        NA  1.5840136 0.42668172
15  4.689617 8.7176568  7.4747099         NA
16  9.730568 1.1347636  4.6798102 0.21574411
17 12.063813 6.8625491  0.5594976 0.23198410

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM