简体   繁体   English

如何从 R 中的 dataframe 中删除带有 inf 的行?

[英]How can I remove rows with inf from my dataframe in R?

I have a very large dataframe(ICS_data) with about 129 columns(variables) and 5276 rows.我有一个非常大的数据框(ICS_data),大约有 129 列(变量)和 5276 行。 Some of the rows contain inf values in single or multiple variables.一些行在单个或多个变量中包含 inf 值。 I have used na.omit(df) to remove rows with NA and NaN but it's still giving me errors.我已经使用 na.omit(df) 删除了带有 NA 和 NaN 的行,但它仍然给我错误。 When I searched SO for similar error, I got this code ICS_data[is.finite(rowSums(ICS_data)),] as a possible solution but when I run it on my dataframe, I am still having another error message > powerdata <- ICS_data[is.finite(rowSums(ICS_data)),] Error in rowSums(ICS_data): 'x' must be numeric .当我在 SO 中搜索类似错误时,我得到了此代码ICS_data[is.finite(rowSums(ICS_data)),]作为可能的解决方案,但是当我在 dataframe 上运行它时,我仍然收到另一条错误消息> powerdata <- ICS_data[is.finite(rowSums(ICS_data)),] Error in rowSums(ICS_data): 'x' must be numeric I have checked my dataset and they are all numeric except my reference variable which is a factor.我检查了我的数据集,它们都是数字,除了我的参考变量是一个因素。 can someone please help me out?有人可以帮帮我吗?

    > sapply(ICS_data, class)
     R1.PA1.VH           R1.PM1.V          R1.PA2.VH           R1.PM2.V 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R1.PA3.VH           R1.PM3.V          R1.PA4.IH           R1.PM4.I 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R1.PA5.IH           R1.PM5.I          R1.PA6.IH           R1.PM6.I 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R1.PA7.VH           R1.PM7.V          R1.PA8.VH           R1.PM8.V 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R1.PA9.VH           R1.PM9.V         R1.PA10.IH          R1.PM10.I 
     "numeric"          "numeric"          "numeric"          "numeric" 
    R1.PA11.IH          R1.PM11.I         R1.PA12.IH          R1.PM12.I 
     "numeric"          "numeric"          "numeric"          "numeric" 
          R1.F              R1.DF            R1.PA.Z           R1.PA.ZH 
     "numeric"          "numeric"          "numeric"          "numeric" 
          R1.S          R2.PA1.VH           R2.PM1.V          R2.PA2.VH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R2.PM2.V          R2.PA3.VH           R2.PM3.V          R2.PA4.IH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R2.PM4.I          R2.PA5.IH           R2.PM5.I          R2.PA6.IH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R2.PM6.I          R2.PA7.VH           R2.PM7.V          R2.PA8.VH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R2.PM8.V          R2.PA9.VH           R2.PM9.V         R2.PA10.IH 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R2.PM10.I         R2.PA11.IH          R2.PM11.I         R2.PA12.IH 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R2.PM12.I               R2.F              R2.DF            R2.PA.Z 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R2.PA.ZH               R2.S          R3.PA1.VH           R3.PM1.V 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R3.PA2.VH           R3.PM2.V          R3.PA3.VH           R3.PM3.V 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R3.PA4.IH           R3.PM4.I          R3.PA5.IH           R3.PM5.I 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R3.PA6.IH           R3.PM6.I          R3.PA7.VH           R3.PM7.V 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R3.PA8.VH           R3.PM8.V          R3.PA9.VH           R3.PM9.V 
     "numeric"          "numeric"          "numeric"          "numeric" 
    R3.PA10.IH          R3.PM10.I         R3.PA11.IH          R3.PM11.I 
     "numeric"          "numeric"          "numeric"          "numeric" 
    R3.PA12.IH          R3.PM12.I               R3.F              R3.DF 
     "numeric"          "numeric"          "numeric"          "numeric" 
       R3.PA.Z           R3.PA.ZH               R3.S          R4.PA1.VH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R4.PM1.V          R4.PA2.VH           R4.PM2.V          R4.PA3.VH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R4.PM3.V          R4.PA4.IH           R4.PM4.I          R4.PA5.IH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R4.PM5.I          R4.PA6.IH           R4.PM6.I          R4.PA7.VH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R4.PM7.V          R4.PA8.VH           R4.PM8.V          R4.PA9.VH 
     "numeric"          "numeric"          "numeric"          "numeric" 
      R4.PM9.V         R4.PA10.IH          R4.PM10.I         R4.PA11.IH 
     "numeric"          "numeric"          "numeric"          "numeric" 
     R4.PM11.I         R4.PA12.IH          R4.PM12.I               R4.F 
     "numeric"          "numeric"          "numeric"          "numeric" 
         R4.DF            R4.PA.Z           R4.PA.ZH               R4.S 
     "numeric"          "numeric"          "numeric"          "numeric" 
    control_panel_log1 control_panel_log2 control_panel_log3 control_panel_log4 
     "numeric"          "numeric"          "numeric"          "numeric" 
    relay1_log         relay2_log         relay3_log         relay4_log 
     "numeric"          "numeric"          "numeric"          "numeric" 
    snort_log1         snort_log2         snort_log3         snort_log4 
     "numeric"          "numeric"          "numeric"          "numeric" 
        marker 
      "factor"

To remove rows with Inf values you can use:要删除具有Inf值的行,您可以使用:

ICS_data[rowSums(sapply(ICS_data[-ncol(ICS_data)], is.infinite)) == 0, ]

Or using dplyr :或使用dplyr

library(dplyr)
ICS_data %>% filter_at(-ncol(.), all_vars(is.finite(.)))

We can break the code into smaller steps to understand how it works.我们可以将代码分解成更小的步骤来理解它是如何工作的。

Consider this data.考虑这些数据。

data <- data.frame(a = 1:4, b = 2:5, c = letters[1:4], stringsAsFactors = TRUE)
data$b[2] <- Inf
data
#  a   b c
#1 1   2 a
#2 2 Inf b
#3 3   4 c
#4 4   5 d

First we remove the last column from data .首先,我们从data中删除最后一列。 We remove that since the last column is factor as we don't want to include that to find infinite values.我们删除它,因为最后一列是factor ,因为我们不想包含它来查找无限值。 So we get only numeric columns.所以我们只得到数字列。

data[-ncol(data)]

#  a   b
#1 1   2
#2 2 Inf
#3 3   4
#4 4   5

Next using sapply we find out in each column which value are infinite using is.infinite .接下来使用sapply我们使用is.infinite在每一列中找出哪些值是无限的。 This returns back a matrix with TRUE / FALSE values.这将返回一个具有TRUE / FALSE值的矩阵。

sapply(data[-ncol(data)], is.infinite)

#         a     b
#[1,] FALSE FALSE
#[2,] FALSE  TRUE
#[3,] FALSE FALSE
#[4,] FALSE FALSE

We can sum these logical values using rowSums .我们可以使用rowSums对这些逻辑值求和。 Here TRUE is considered as 1 and FALSE as 0.这里TRUE被认为是 1, FALSE被认为是 0。

rowSums(sapply(data[-ncol(data)], is.infinite))
#[1] 0 1 0 0

Using this we come to know that the second row has 1 infinite value and we need to drop that.使用这个我们知道第二行有 1 个无限值,我们需要删除它。 So we select rows which has 0 infinite value.所以我们 select 行有 0 个无限值。

data[rowSums(sapply(data[-ncol(data)], is.infinite)) == 0, ]

#  a b c
#1 1 2 a
#3 3 4 c
#4 4 5 d

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM