简体   繁体   English

修剪 R 中的极值

[英]Trim extreme values in R

I'm working on a large data and I'm trying to trim the extreme values like the ones < 0 and > 8 how can I do that in R?我正在处理一个大数据,我正在尝试修剪极值,比如那些 < 0 和 > 8 我怎么能在 R 中做到这一点?

Atrim = c(0,8) Atrim = c(0,8)

You were not clear how to handle the outliers.您不清楚如何处理异常值。 Should they be elimnated or changed to NAs?他们应该被淘汰还是改为 NAs? Here is a simple example that deletes the extreme values:这是一个删除极值的简单示例:

set.seed(42) # for reproducibility
x <- rnorm(50, 4, 2.5)   # Generate 50 random values
y <- x[x >= 0 & x <= 8]  # Remove outliers
length(y)
# [1] 41  # Nine values were removed.

If the data are in a matrix and you want to replace the outliers with NA, try this:如果数据在矩阵中并且您想用 NA 替换异常值,请尝试以下操作:

set.seed(42) # for reproducibility
x <- matrix(rnorm(49, 4, 2.5), 7, 7)   # Generate 7 x7 matrix of random values
print(x, digits=3)  # Print with 3 significant digits
#      [,1]  [,2]  [,3]   [,4] [,5]   [,6]  [,7]
# [1,] 7.43 3.763  3.67 -0.453 5.15 -0.293 5.895
# [2,] 2.59 9.046  5.59  3.570 2.40  2.039 2.183
# [3,] 4.91 3.843  3.29  7.037 5.14  1.873 0.579
# [4,] 5.58 7.262 -2.64  8.738 5.76 -2.036 5.082
# [5,] 5.01 9.717 -2.10  2.924 6.59  4.090 1.972
# [6,] 3.73 0.528  7.30  3.357 2.48  4.515 7.610
# [7,] 7.78 3.303  3.23 -0.408 5.26  3.097 2.921
idx <- which(x < 0 | x > 8, arr.ind=TRUE)   # Identify outliers
x[idx] <- NA
print(x, digits=3)
#      [,1]  [,2] [,3] [,4] [,5] [,6]  [,7]
# [1,] 7.43 3.763 3.67   NA 5.15   NA 5.895
# [2,] 2.59    NA 5.59 3.57 2.40 2.04 2.183
# [3,] 4.91 3.843 3.29 7.04 5.14 1.87 0.579
# [4,] 5.58 7.262   NA   NA 5.76   NA 5.082
# [5,] 5.01    NA   NA 2.92 6.59 4.09 1.972
# [6,] 3.73 0.528 7.30 3.36 2.48 4.51 7.610
# [7,] 7.78 3.303 3.23   NA 5.26 3.10 2.921

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM