[英]How can I remove rows with inf from my dataframe in R?
I have a very large dataframe(ICS_data) with about 129 columns(variables) and 5276 rows.我有一个非常大的数据框(ICS_data),大约有 129 列(变量)和 5276 行。 Some of the rows contain inf values in single or multiple variables.一些行在单个或多个变量中包含 inf 值。 I have used na.omit(df) to remove rows with NA and NaN but it's still giving me errors.我已经使用 na.omit(df) 删除了带有 NA 和 NaN 的行,但它仍然给我错误。 When I searched SO for similar error, I got this code ICS_data[is.finite(rowSums(ICS_data)),]
as a possible solution but when I run it on my dataframe, I am still having another error message > powerdata <- ICS_data[is.finite(rowSums(ICS_data)),] Error in rowSums(ICS_data): 'x' must be numeric
.当我在 SO 中搜索类似错误时,我得到了此代码ICS_data[is.finite(rowSums(ICS_data)),]
作为可能的解决方案,但是当我在 dataframe 上运行它时,我仍然收到另一条错误消息> powerdata <- ICS_data[is.finite(rowSums(ICS_data)),] Error in rowSums(ICS_data): 'x' must be numeric
。 I have checked my dataset and they are all numeric except my reference variable which is a factor.我检查了我的数据集,它们都是数字,除了我的参考变量是一个因素。 can someone please help me out?有人可以帮帮我吗?
> sapply(ICS_data, class)
R1.PA1.VH R1.PM1.V R1.PA2.VH R1.PM2.V
"numeric" "numeric" "numeric" "numeric"
R1.PA3.VH R1.PM3.V R1.PA4.IH R1.PM4.I
"numeric" "numeric" "numeric" "numeric"
R1.PA5.IH R1.PM5.I R1.PA6.IH R1.PM6.I
"numeric" "numeric" "numeric" "numeric"
R1.PA7.VH R1.PM7.V R1.PA8.VH R1.PM8.V
"numeric" "numeric" "numeric" "numeric"
R1.PA9.VH R1.PM9.V R1.PA10.IH R1.PM10.I
"numeric" "numeric" "numeric" "numeric"
R1.PA11.IH R1.PM11.I R1.PA12.IH R1.PM12.I
"numeric" "numeric" "numeric" "numeric"
R1.F R1.DF R1.PA.Z R1.PA.ZH
"numeric" "numeric" "numeric" "numeric"
R1.S R2.PA1.VH R2.PM1.V R2.PA2.VH
"numeric" "numeric" "numeric" "numeric"
R2.PM2.V R2.PA3.VH R2.PM3.V R2.PA4.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM4.I R2.PA5.IH R2.PM5.I R2.PA6.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM6.I R2.PA7.VH R2.PM7.V R2.PA8.VH
"numeric" "numeric" "numeric" "numeric"
R2.PM8.V R2.PA9.VH R2.PM9.V R2.PA10.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM10.I R2.PA11.IH R2.PM11.I R2.PA12.IH
"numeric" "numeric" "numeric" "numeric"
R2.PM12.I R2.F R2.DF R2.PA.Z
"numeric" "numeric" "numeric" "numeric"
R2.PA.ZH R2.S R3.PA1.VH R3.PM1.V
"numeric" "numeric" "numeric" "numeric"
R3.PA2.VH R3.PM2.V R3.PA3.VH R3.PM3.V
"numeric" "numeric" "numeric" "numeric"
R3.PA4.IH R3.PM4.I R3.PA5.IH R3.PM5.I
"numeric" "numeric" "numeric" "numeric"
R3.PA6.IH R3.PM6.I R3.PA7.VH R3.PM7.V
"numeric" "numeric" "numeric" "numeric"
R3.PA8.VH R3.PM8.V R3.PA9.VH R3.PM9.V
"numeric" "numeric" "numeric" "numeric"
R3.PA10.IH R3.PM10.I R3.PA11.IH R3.PM11.I
"numeric" "numeric" "numeric" "numeric"
R3.PA12.IH R3.PM12.I R3.F R3.DF
"numeric" "numeric" "numeric" "numeric"
R3.PA.Z R3.PA.ZH R3.S R4.PA1.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM1.V R4.PA2.VH R4.PM2.V R4.PA3.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM3.V R4.PA4.IH R4.PM4.I R4.PA5.IH
"numeric" "numeric" "numeric" "numeric"
R4.PM5.I R4.PA6.IH R4.PM6.I R4.PA7.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM7.V R4.PA8.VH R4.PM8.V R4.PA9.VH
"numeric" "numeric" "numeric" "numeric"
R4.PM9.V R4.PA10.IH R4.PM10.I R4.PA11.IH
"numeric" "numeric" "numeric" "numeric"
R4.PM11.I R4.PA12.IH R4.PM12.I R4.F
"numeric" "numeric" "numeric" "numeric"
R4.DF R4.PA.Z R4.PA.ZH R4.S
"numeric" "numeric" "numeric" "numeric"
control_panel_log1 control_panel_log2 control_panel_log3 control_panel_log4
"numeric" "numeric" "numeric" "numeric"
relay1_log relay2_log relay3_log relay4_log
"numeric" "numeric" "numeric" "numeric"
snort_log1 snort_log2 snort_log3 snort_log4
"numeric" "numeric" "numeric" "numeric"
marker
"factor"
To remove rows with Inf
values you can use:要删除具有Inf
值的行,您可以使用:
ICS_data[rowSums(sapply(ICS_data[-ncol(ICS_data)], is.infinite)) == 0, ]
Or using dplyr
:或使用dplyr
:
library(dplyr)
ICS_data %>% filter_at(-ncol(.), all_vars(is.finite(.)))
We can break the code into smaller steps to understand how it works.我们可以将代码分解成更小的步骤来理解它是如何工作的。
Consider this data.考虑这些数据。
data <- data.frame(a = 1:4, b = 2:5, c = letters[1:4], stringsAsFactors = TRUE)
data$b[2] <- Inf
data
# a b c
#1 1 2 a
#2 2 Inf b
#3 3 4 c
#4 4 5 d
First we remove the last column from data
.首先,我们从data
中删除最后一列。 We remove that since the last column is factor
as we don't want to include that to find infinite values.我们删除它,因为最后一列是factor
,因为我们不想包含它来查找无限值。 So we get only numeric columns.所以我们只得到数字列。
data[-ncol(data)]
# a b
#1 1 2
#2 2 Inf
#3 3 4
#4 4 5
Next using sapply
we find out in each column which value are infinite using is.infinite
.接下来使用sapply
我们使用is.infinite
在每一列中找出哪些值是无限的。 This returns back a matrix with TRUE
/ FALSE
values.这将返回一个具有TRUE
/ FALSE
值的矩阵。
sapply(data[-ncol(data)], is.infinite)
# a b
#[1,] FALSE FALSE
#[2,] FALSE TRUE
#[3,] FALSE FALSE
#[4,] FALSE FALSE
We can sum these logical values using rowSums
.我们可以使用rowSums
对这些逻辑值求和。 Here TRUE
is considered as 1 and FALSE
as 0.这里TRUE
被认为是 1, FALSE
被认为是 0。
rowSums(sapply(data[-ncol(data)], is.infinite))
#[1] 0 1 0 0
Using this we come to know that the second row has 1 infinite value and we need to drop that.使用这个我们知道第二行有 1 个无限值,我们需要删除它。 So we select rows which has 0 infinite value.所以我们 select 行有 0 个无限值。
data[rowSums(sapply(data[-ncol(data)], is.infinite)) == 0, ]
# a b c
#1 1 2 a
#3 3 4 c
#4 4 5 d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.