[英]Trim extreme values in R
I'm working on a large data and I'm trying to trim the extreme values like the ones < 0 and > 8 how can I do that in R?我正在处理一个大数据,我正在尝试修剪极值,比如那些 < 0 和 > 8 我怎么能在 R 中做到这一点?
Atrim = c(0,8) Atrim = c(0,8)
You were not clear how to handle the outliers.您不清楚如何处理异常值。 Should they be elimnated or changed to NAs?
他们应该被淘汰还是改为 NAs? Here is a simple example that deletes the extreme values:
这是一个删除极值的简单示例:
set.seed(42) # for reproducibility
x <- rnorm(50, 4, 2.5) # Generate 50 random values
y <- x[x >= 0 & x <= 8] # Remove outliers
length(y)
# [1] 41 # Nine values were removed.
If the data are in a matrix and you want to replace the outliers with NA, try this:如果数据在矩阵中并且您想用 NA 替换异常值,请尝试以下操作:
set.seed(42) # for reproducibility
x <- matrix(rnorm(49, 4, 2.5), 7, 7) # Generate 7 x7 matrix of random values
print(x, digits=3) # Print with 3 significant digits
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 7.43 3.763 3.67 -0.453 5.15 -0.293 5.895
# [2,] 2.59 9.046 5.59 3.570 2.40 2.039 2.183
# [3,] 4.91 3.843 3.29 7.037 5.14 1.873 0.579
# [4,] 5.58 7.262 -2.64 8.738 5.76 -2.036 5.082
# [5,] 5.01 9.717 -2.10 2.924 6.59 4.090 1.972
# [6,] 3.73 0.528 7.30 3.357 2.48 4.515 7.610
# [7,] 7.78 3.303 3.23 -0.408 5.26 3.097 2.921
idx <- which(x < 0 | x > 8, arr.ind=TRUE) # Identify outliers
x[idx] <- NA
print(x, digits=3)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 7.43 3.763 3.67 NA 5.15 NA 5.895
# [2,] 2.59 NA 5.59 3.57 2.40 2.04 2.183
# [3,] 4.91 3.843 3.29 7.04 5.14 1.87 0.579
# [4,] 5.58 7.262 NA NA 5.76 NA 5.082
# [5,] 5.01 NA NA 2.92 6.59 4.09 1.972
# [6,] 3.73 0.528 7.30 3.36 2.48 4.51 7.610
# [7,] 7.78 3.303 3.23 NA 5.26 3.10 2.921
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.