简体   繁体   English

根据多列中的值删除R中的行

[英]Deleting rows in R based on values over multiple columns

I'm trying to remove rows from a dataframe. 我正在尝试从数据框中删除行。

I need to remove rows that have only "NONE" or white space across the entire range of columns I provide. 我需要删除在我提供的整个列范围内只有 “ NONE”或空白的行。 The rows that need to be removed must satisfy these conditions 1. a combination of only "NONE" and white space or all "NONE" or all white space. 需要删除的行必须满足以下条件:1.仅“ NONE”和空白或所有“ NONE”或所有空白的组合。

Because there are cases where having "NONE" or white space in some of the columns is okay I can't just filter out rows when reading in the csv with something like 因为在某些情况下在某些列中具有“ NONE”或空白是可以的,所以当我在CSV中读取内容时,例如

dataframe$col1 =="NONE" | str_length(dataframe$col1)==0

I know this would normally be a simple problem where I could run a for loop that turns all "NONE" values and whitespace in a dataframe to NA and use complete.cases across whichever columns I need ( doc ). 我知道这通常是一个简单的问题,我可以运行一个for循环,将数据帧中的所有“ NONE”值和空格都转换为NA,并在需要的任何列之间使用complete.casesdoc )。 However, I'm specifically being asked to use a method that does not change the values. 但是,特别要求我使用不更改值的方法。 Any suggestions? 有什么建议么?

Edit: I don't have the data, but here is a made up example of a dataframe that would be similar to what I have to work with 编辑:我没有数据,但这是一个数据框的组合示例,该示例类似于我必须使用的数据框

在此处输入图片说明

In this dataframe the only row that must be removed is row 3 (or row 4 if you include the header). 在此数据框中,唯一必须删除的行是第3行(如果包含标题,则为第4行)。

The final dataset will have many more columns than this made up example 最终数据集将具有比此示例更多的列

I would recommend using the filter() command from the dplyr package (part of the tidyverse library). 我建议使用dplyr软件包(tidyverse库的一部分filter()filter()命令。 It would look something like this: 它看起来像这样:

dataframe_new <- filter(dataframe, col1 == "" | str_length(col1) == 0)

由于需要删除具有NONE和空白的行,因此将是:

dataframe <- filter(dataframe, col1 != "NONE" & str_length(col1) != 0)

You can use dplyr::filter_all() to accomplish this: 您可以使用dplyr::filter_all()完成此操作:

library(dplyr)

df <- data.frame(column.1 = c('a', 'b', 'NONE', 'b', 'b'),
                 column.2 = c('a', 'b', '', 'b', 'b'),
                 column.3 = rep('', 5),
                 column.4 = rep('', 5),
                 column.5 = rep('', 5))

df %>%
  filter_all(any_vars(. != 'NONE' & . != ''))
is.none <- function(x) tolower(x) == "none"
is.whitespace <- function(x) grepl("^\\s+$", x)
is.empty <- function(x) length(x) == 0 || x == "" || is.na(x) || is.nan(x)
is.none.whitespace.empty <- function(x) is.none(x) || is.whitespace(x) || is.empty(x)

is.none.whitespace.empty <- Vectorize(is.none.whitespace.empty)

remove.empty.rows <- function(df, cols) {
  df[!sapply(1:nrow(df), 
             function(i) all(is.none.whitespace.empty(df[i, cols]))), ]
}

Now you can test it: 现在您可以对其进行测试:

# in your case:
remove.empty.rows(df, 1) # remove if first column content is "empty"

# but you can determine which columns should be examined for being all
# "empty".
# let's say, you want to evaluate only first, third and fifth column:
remove.empty.rows(df, c(1, 3, 5))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM