简体   繁体   English

检查数据框列是否有空单元格

[英]check if column of data frame have empty cells

I am checking if my column (name) have any empty cell but getting error.我正在检查我的列(名称)是否有任何空单元格但出现错误。 any solution....???任何解决方案......? i am trying in this way.... also how i can disregard if that cell has space, i mean remove space if that cell have then check if t is empty, i just don't want change original name column, while checking i just want to remove spaces or NA and the check if the cells are empty.我正在以这种方式尝试....还有我如何忽略该单元格是否有空间,我的意思是删除空间如果该单元格有然后检查 t 是否为空,我只是不想更改原始名称列,同时检查我只想删除空格或 NA 并检查单元格是否为空。

df8 <- data.frame(name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","dgsy,aaf","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","","sgyu,hytb","vdti,kula","mftyu,huta","","cday,bhsue","ajtu,nudj"),
                  email=c("xab.try@ybcd.com","Lan.xab@ybcd.com","tth.vgu@ybcd.com","mmc.vgtu@ybcd.com","aaf.dgsy@ybcd.com","nnhu.kull@ybcd.com","njam.hula@ybcd.com","jiha.mund@ybcd.com","ntha.htfy@ybcd.com","gydbt.bhr@ybcd.com","hytb.sgyu@ybcd.com","kula.vdti@ybcd.com","huta.mftyu@ybcd.com","ggat.khul@ybcd.com","bhsue.cday@ybcd.com","nudj.ajtu@ybcd.com"))

df8 <- df8 %>% mutate(is_blank_node = which(df8$name == "", arr.ind = TRUE),1 )
Error:

Error: Problem with mutate() input is_blank_node.
x Input is_blank_name can't be recycled to size 182753.
i Input is_blank_name is which(df$Name == "", arr.ind = TRUE).
i Input is_blank_name must be size 182753 or 1, not 0.

expected output预期产出

在此处输入图片说明

You don't need which at all.你根本不需要which In fact, it causes the error here since the result is of length 2 (only the TRUE values are taken into account) and it returns the position of the positive outcomes of your test only.事实上,它会在此处导致错误,因为结果的长度为 2(仅考虑TRUE值)并且它仅返回测试的正结果的位置。 mutate can take the result from name == "" directly. mutate可以直接从name == ""获取结果。 dplyr also knows already that you evaluate the column name within df8 . dplyr也已经知道您评估了df8的列name So you can (and should) omit df$ :所以你可以(并且应该)省略df$

df8 <- data.frame(name=c("try,xab","xab,Lan","mhy,mun","vgtu,mmc","dgsy,aaf","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","","sgyu,hytb","vdti,kula","mftyu,huta","","cday,bhsue","ajtu,nudj"),
                  email=c("xab.try@ybcd.com","Lan.xab@ybcd.com","tth.vgu@ybcd.com","mmc.vgtu@ybcd.com","aaf.dgsy@ybcd.com","nnhu.kull@ybcd.com","njam.hula@ybcd.com","jiha.mund@ybcd.com","ntha.htfy@ybcd.com","gydbt.bhr@ybcd.com","hytb.sgyu@ybcd.com","kula.vdti@ybcd.com","huta.mftyu@ybcd.com","ggat.khul@ybcd.com","bhsue.cday@ybcd.com","nudj.ajtu@ybcd.com"))

library(tidyverse)

df8 %>% 
  mutate(is_blank_node = name == "")
#>          name               email is_blank_node
#> 1     try,xab    xab.try@ybcd.com         FALSE
#> 2     xab,Lan    Lan.xab@ybcd.com         FALSE
#> 3     mhy,mun    tth.vgu@ybcd.com         FALSE
#> 4    vgtu,mmc   mmc.vgtu@ybcd.com         FALSE
#> 5    dgsy,aaf   aaf.dgsy@ybcd.com         FALSE
#> 6   kull,nnhu  nnhu.kull@ybcd.com         FALSE
#> 7   hula,njam  njam.hula@ybcd.com         FALSE
#> 8   mund,jiha  jiha.mund@ybcd.com         FALSE
#> 9   htfy,ntha  ntha.htfy@ybcd.com         FALSE
#> 10             gydbt.bhr@ybcd.com          TRUE
#> 11  sgyu,hytb  hytb.sgyu@ybcd.com         FALSE
#> 12  vdti,kula  kula.vdti@ybcd.com         FALSE
#> 13 mftyu,huta huta.mftyu@ybcd.com         FALSE
#> 14             ggat.khul@ybcd.com          TRUE
#> 15 cday,bhsue bhsue.cday@ybcd.com         FALSE
#> 16  ajtu,nudj  nudj.ajtu@ybcd.com         FALSE

Created on 2020-09-17 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 9 月 17 日创建

update更新

TRUE and FALSE are basically equivalent to 1 and 0 just in logical instead of integer / numeric type. TRUEFALSE基本上等同于10只是在logical而不是integer / numeric类型。 You can try this with TRUE * 1 which turns the logical into a numeric value.您可以尝试使用TRUE * 1logical值转换为numeric Or use as.integer directly.或者直接使用as.integer To get around the problem of cells being filled only with whitespace or NA you can also include extra steps.要解决仅用空格或NA填充单元格的问题,您还可以包括额外的步骤。 Since this is getting a bit verbose, we can wrap it in a function:由于这有点冗长,我们可以将它包装在一个函数中:

check_blank <- function(x) {
  as.integer(trimws(ifelse(is.na(x), "", x)) == "")
}

df8 %>% 
  mutate(is_blank_node = check_blank(name))
#>          name               email is_blank_node
#> 1     try,xab    xab.try@ybcd.com             0
#> 2     xab,Lan    Lan.xab@ybcd.com             0
#> 3     mhy,mun    tth.vgu@ybcd.com             0
#> 4    vgtu,mmc   mmc.vgtu@ybcd.com             0
#> 5    dgsy,aaf   aaf.dgsy@ybcd.com             0
#> 6   kull,nnhu  nnhu.kull@ybcd.com             0
#> 7   hula,njam  njam.hula@ybcd.com             0
#> 8   mund,jiha  jiha.mund@ybcd.com             0
#> 9   htfy,ntha  ntha.htfy@ybcd.com             0
#> 10             gydbt.bhr@ybcd.com             1
#> 11  sgyu,hytb  hytb.sgyu@ybcd.com             0
#> 12  vdti,kula  kula.vdti@ybcd.com             0
#> 13 mftyu,huta huta.mftyu@ybcd.com             0
#> 14             ggat.khul@ybcd.com             1
#> 15 cday,bhsue bhsue.cday@ybcd.com             0
#> 16  ajtu,nudj  nudj.ajtu@ybcd.com             0

Created on 2020-09-17 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 9 月 17 日创建

Maybe you can try nchar like below也许你可以像下面这样尝试nchar

df8 %>%
  mutate(is_blank_node = +(nchar(name)==0))

or nzcharnzchar

df8 %>%
  mutate(is_blank_node = +!nzchar(name))

which gives这使

> df8 %>%
+   mutate(is_blank_node = +(nchar(name)==0))
         name               email is_blank_node
1     try,xab    xab.try@ybcd.com             0
2     xab,Lan    Lan.xab@ybcd.com             0
3     mhy,mun    tth.vgu@ybcd.com             0
4    vgtu,mmc   mmc.vgtu@ybcd.com             0
5    dgsy,aaf   aaf.dgsy@ybcd.com             0
6   kull,nnhu  nnhu.kull@ybcd.com             0
7   hula,njam  njam.hula@ybcd.com             0
8   mund,jiha  jiha.mund@ybcd.com             0
9   htfy,ntha  ntha.htfy@ybcd.com             0
10             gydbt.bhr@ybcd.com             1
11  sgyu,hytb  hytb.sgyu@ybcd.com             0
12  vdti,kula  kula.vdti@ybcd.com             0
13 mftyu,huta huta.mftyu@ybcd.com             0
14             ggat.khul@ybcd.com             1
15 cday,bhsue bhsue.cday@ybcd.com             0
16  ajtu,nudj  nudj.ajtu@ybcd.com             0

Base R solution, checking if any empty strings in all vectors:基本 R 解决方案,检查所有向量中是否有空字符串:

data.frame(+(t(apply(df8, 1, `==`, ""))))

Base R, with results column-bind to the original data.frame:基础 R,结果列绑定到原始 data.frame:

cbind(df8, setNames(data.frame(+(t(apply(df8, 1, `==`, "")))), 
         paste("empty", names(df8), sep = "_")))

This will mutate said column with your desired values (1 or 0)这将使用您想要的值(1 或 0)改变所述列

df8 <- df8 %>% mutate(is_blank_node = ifelse(name == "", 1, 0))

Update: added line that removes any whitespace from the column, then will check if the cell is empty...更新:添加了从列中删除任何空格的行,然后将检查单元格是否为空...

df8 <- df8 %>% 
  mutate(name = trimws(name, which = "both")) %>%
  mutate(is_blank_node = ifelse(name == "", 1, 0))

Update 2: This will give a '1' to any cell detected as blank or having only spaces (no matter amount of spaces), and give a '0' to anything else.更新 2:这将为检测为空白或只有空格(无论空格数量)的任何单元格给出“1”,并为其他任何单元格给出“0”。 This does not change the contents of the original column.这不会更改原始列的内容。

library(tidyverse)

df8 <- df8 %>% mutate(is_blank_node = ifelse(name == "" | str_detect(name, '^\\s*$'), 1, 0))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM