根据多列中的值删除R中的行

Question

I'm trying to remove rows from a dataframe. 我正在尝试从数据框中删除行。

I need to remove rows that have only "NONE" or white space across the entire range of columns I provide. 我需要删除在我提供的整个列范围内只有 “ NONE”或空白的行。 The rows that need to be removed must satisfy these conditions 1. a combination of only "NONE" and white space or all "NONE" or all white space. 需要删除的行必须满足以下条件：1.仅“ NONE”和空白或所有“ NONE”或所有空白的组合。

Because there are cases where having "NONE" or white space in some of the columns is okay I can't just filter out rows when reading in the csv with something like 因为在某些情况下在某些列中具有“ NONE”或空白是可以的，所以当我在CSV中读取内容时，例如

dataframe$col1 =="NONE" | str_length(dataframe$col1)==0

I know this would normally be a simple problem where I could run a for loop that turns all "NONE" values and whitespace in a dataframe to NA and use complete.cases across whichever columns I need ( doc ). 我知道这通常是一个简单的问题，我可以运行一个for循环，将数据帧中的所有“ NONE”值和空格都转换为NA，并在需要的任何列之间使用complete.cases （ doc ）。 However, I'm specifically being asked to use a method that does not change the values. 但是，特别要求我使用不更改值的方法。 Any suggestions? 有什么建议么？

Edit: I don't have the data, but here is a made up example of a dataframe that would be similar to what I have to work with 编辑：我没有数据，但这是一个数据框的组合示例，该示例类似于我必须使用的数据框

In this dataframe the only row that must be removed is row 3 (or row 4 if you include the header). 在此数据框中，唯一必须删除的行是第3行（如果包含标题，则为第4行）。

The final dataset will have many more columns than this made up example 最终数据集将具有比此示例更多的列

Answer 1

I would recommend using the filter() command from the dplyr package (part of the tidyverse library). 我建议使用dplyr软件包（tidyverse库的一部分filter()的filter()命令。 It would look something like this: 它看起来像这样：

dataframe_new <- filter(dataframe, col1 == "" | str_length(col1) == 0)

Answer 2

由于需要删除具有NONE和空白的行，因此将是：

dataframe <- filter(dataframe, col1 != "NONE" & str_length(col1) != 0)

Answer 3

You can use dplyr::filter_all() to accomplish this: 您可以使用dplyr::filter_all()完成此操作：

library(dplyr)

df <- data.frame(column.1 = c('a', 'b', 'NONE', 'b', 'b'),
                 column.2 = c('a', 'b', '', 'b', 'b'),
                 column.3 = rep('', 5),
                 column.4 = rep('', 5),
                 column.5 = rep('', 5))

df %>%
  filter_all(any_vars(. != 'NONE' & . != ''))

Answer 4

is.none <- function(x) tolower(x) == "none"
is.whitespace <- function(x) grepl("^\\s+$", x)
is.empty <- function(x) length(x) == 0 || x == "" || is.na(x) || is.nan(x)
is.none.whitespace.empty <- function(x) is.none(x) || is.whitespace(x) || is.empty(x)

is.none.whitespace.empty <- Vectorize(is.none.whitespace.empty)

remove.empty.rows <- function(df, cols) {
  df[!sapply(1:nrow(df), 
             function(i) all(is.none.whitespace.empty(df[i, cols]))), ]
}

Now you can test it: 现在您可以对其进行测试：

# in your case:
remove.empty.rows(df, 1) # remove if first column content is "empty"

# but you can determine which columns should be examined for being all
# "empty".
# let's say, you want to evaluate only first, third and fifth column:
remove.empty.rows(df, c(1, 3, 5))

根据多列中的值删除R中的行

问题描述

4 个解决方案

解决方案1
2 2018-12-15 01:02:06

解决方案2
2 2018-12-15 01:12:00

解决方案3
2 已采纳 2018-12-15 02:10:42

解决方案4
1 2018-12-15 01:43:38

根据多列中的值删除R中的行

问题描述

4 个解决方案

解决方案1 2 2018-12-15 01:02:06

解决方案2 2 2018-12-15 01:12:00

解决方案3 2 已采纳 2018-12-15 02:10:42

解决方案4 1 2018-12-15 01:43:38

解决方案1
2 2018-12-15 01:02:06

解决方案2
2 2018-12-15 01:12:00

解决方案3
2 已采纳 2018-12-15 02:10:42

解决方案4
1 2018-12-15 01:43:38