[英]How can I remove rows with inf from my dataframe in R?
我有一个非常大的数据框(ICS_data),大约有 129 列(变量)和 5276 行。 一些行在单个或多个变量中包含 inf 值。 我已经使用 na.omit(df) 删除了带有 NA 和 NaN 的行,但它仍然给我错误。 当我在 SO 中搜索类似错误时,我得到了此代码ICS_data[is.finite(rowSums(ICS_data)),]
作为可能的解决方案,但是当我在 dataframe 上运行它时,我仍然收到另一条错误消息> powerdata <- ICS_data[is.finite(rowSums(ICS_data)),] Error in rowSums(ICS_data): 'x' must be numeric
。 我检查了我的数据集,它们都是数字,除了我的参考变量是一个因素。 有人可以帮帮我吗?
> sapply(ICS_data, class)
R1.PA1.VH R1.PM1.V R1.PA2.VH R1.PM2.V
"numeric" "numeric" "numeric" "numeric"
R1.PA3.VH R1.PM3.V R1.PA4.IH R1.PM4.I
"numeric" "numeric" "numeric" "numeric"
R1.PA5.IH R1.PM5.I R1.PA6.IH R1.PM6.I
"numeric" "numeric" "numeric" "numeric"
R1.PA7.VH R1.PM7.V R1.PA8.VH R1.PM8.V
"numeric" "numeric" "numeric" "numeric"
R1.PA9.VH R1.PM9.V R1.PA10.IH R1.PM10.I
"numeric" "numeric" "numeric" "numeric"
R1.PA11.IH R1.PM11.I R1.PA12.IH R1.PM12.I
"numeric" "numeric" "numeric" "numeric"
R1.F R1.DF R1.PA.Z R1.PA.ZH
"numeric" "numeric" "numeric" "numeric"
R1.S R2.PA1.VH R2.PM1.V R2.PA2.VH
"numeric" "numeric" "numeric" "numeric"
R2.PM2.V R2.PA3.VH R2.PM3.V R2.PA4.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM4.I R2.PA5.IH R2.PM5.I R2.PA6.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM6.I R2.PA7.VH R2.PM7.V R2.PA8.VH
"numeric" "numeric" "numeric" "numeric"
R2.PM8.V R2.PA9.VH R2.PM9.V R2.PA10.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM10.I R2.PA11.IH R2.PM11.I R2.PA12.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM12.I R2.F R2.DF R2.PA.Z
"numeric" "numeric" "numeric" "numeric"
R2.PA.ZH R2.S R3.PA1.VH R3.PM1.V
"numeric" "numeric" "numeric" "numeric"
R3.PA2.VH R3.PM2.V R3.PA3.VH R3.PM3.V
"numeric" "numeric" "numeric" "numeric"
R3.PA4.IH R3.PM4.I R3.PA5.IH R3.PM5.I
"numeric" "numeric" "numeric" "numeric"
R3.PA6.IH R3.PM6.I R3.PA7.VH R3.PM7.V
"numeric" "numeric" "numeric" "numeric"
R3.PA8.VH R3.PM8.V R3.PA9.VH R3.PM9.V
"numeric" "numeric" "numeric" "numeric"
R3.PA10.IH R3.PM10.I R3.PA11.IH R3.PM11.I
"numeric" "numeric" "numeric" "numeric"
R3.PA12.IH R3.PM12.I R3.F R3.DF
"numeric" "numeric" "numeric" "numeric"
R3.PA.Z R3.PA.ZH R3.S R4.PA1.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM1.V R4.PA2.VH R4.PM2.V R4.PA3.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM3.V R4.PA4.IH R4.PM4.I R4.PA5.IH
"numeric" "numeric" "numeric" "numeric"
R4.PM5.I R4.PA6.IH R4.PM6.I R4.PA7.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM7.V R4.PA8.VH R4.PM8.V R4.PA9.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM9.V R4.PA10.IH R4.PM10.I R4.PA11.IH
"numeric" "numeric" "numeric" "numeric"
R4.PM11.I R4.PA12.IH R4.PM12.I R4.F
"numeric" "numeric" "numeric" "numeric"
R4.DF R4.PA.Z R4.PA.ZH R4.S
"numeric" "numeric" "numeric" "numeric"
control_panel_log1 control_panel_log2 control_panel_log3 control_panel_log4
"numeric" "numeric" "numeric" "numeric"
relay1_log relay2_log relay3_log relay4_log
"numeric" "numeric" "numeric" "numeric"
snort_log1 snort_log2 snort_log3 snort_log4
"numeric" "numeric" "numeric" "numeric"
marker
"factor"
要删除具有Inf
值的行,您可以使用:
ICS_data[rowSums(sapply(ICS_data[-ncol(ICS_data)], is.infinite)) == 0, ]
或使用dplyr
:
library(dplyr)
ICS_data %>% filter_at(-ncol(.), all_vars(is.finite(.)))
我们可以将代码分解成更小的步骤来理解它是如何工作的。
考虑这些数据。
data <- data.frame(a = 1:4, b = 2:5, c = letters[1:4], stringsAsFactors = TRUE)
data$b[2] <- Inf
data
# a b c
#1 1 2 a
#2 2 Inf b
#3 3 4 c
#4 4 5 d
首先,我们从data
中删除最后一列。 我们删除它,因为最后一列是factor
,因为我们不想包含它来查找无限值。 所以我们只得到数字列。
data[-ncol(data)]
# a b
#1 1 2
#2 2 Inf
#3 3 4
#4 4 5
接下来使用sapply
我们使用is.infinite
在每一列中找出哪些值是无限的。 这将返回一个具有TRUE
/ FALSE
值的矩阵。
sapply(data[-ncol(data)], is.infinite)
# a b
#[1,] FALSE FALSE
#[2,] FALSE TRUE
#[3,] FALSE FALSE
#[4,] FALSE FALSE
我们可以使用rowSums
对这些逻辑值求和。 这里TRUE
被认为是 1, FALSE
被认为是 0。
rowSums(sapply(data[-ncol(data)], is.infinite))
#[1] 0 1 0 0
使用这个我们知道第二行有 1 个无限值,我们需要删除它。 所以我们 select 行有 0 个无限值。
data[rowSums(sapply(data[-ncol(data)], is.infinite)) == 0, ]
# a b c
#1 1 2 a
#3 3 4 c
#4 4 5 d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.