简体   繁体   English

R使用不包含单个值的单元格删除数据框中的行

[英]R removing rows in data frame with cells that do not contain a single value

I have a data frame for which I wish to remove rows that have multiple values in certain columns. 我有一个数据框,希望删除某些列中具有多个值的行。

With the following example data: 带有以下示例数据:

my.data <- structure(list(alts = structure(c(2L, 16L, 8L), .Label = c("A", 
"C", "c(\"\", \"A\")", "c(\"A\", \"\")", "c(\"\", \"C\")", "c(\"C\", \"\")", 
"c(\"C\", \"C,G\")", "c(\"C,G\", \"G\")", "c(\"\", \"G\")", "c(\"G\", \"\")", 
"c(\"G\", \"A\")", "c(\"\", \"T\")", "c(\"T\", \"\")", "c(\"\", \"T\", \"C\")", 
"G", "T"), class = "factor"), Coordinate = c(13687520L, 13687570L, 
13687591L), `3115` = c("C", "T", "C,G"), `3124` = c("C", "T", 
"C,G"), `9582` = c("C", "T", "C,G"), `9583` = c("C", "T", "C,G"
), `9584` = c("C", "T", "G"), `9585` = c("C", "T", "C,G"), `9586` = c("C", 
"T", "C,G"), `9587` = c("C", "T", "C,G"), `9588` = c("C", "T", 
"C,G"), `9590` = c("C", "T", "C,G"), `9592` = c("C", "T", "C,G"
), `9593` = c("C", "T", "C,G"), `9594` = c("C", "T", "G"), `9595` = c("C", 
"T", "C,G"), `9596` = c("C", "T", "C,G"), `9597` = c("C", "T", 
"C,G"), `9598` = c("C", "T", "C,G"), `9599` = c("C", "T", "C,G"
), `9600` = c("C", "T", "C,G"), `9601` = c("C", "T", "C,G")), .Names = c("alts", 
"Coordinate", "3115", "3124", "9582", "9583", "9584", "9585", 
"9586", "9587", "9588", "9590", "9592", "9593", "9594", "9595", 
"9596", "9597", "9598", "9599", "9600", "9601"), row.names = 324:326, class = "data.frame")

I wish to check columns 3 to 22 and remove if the above condition is met. 我希望检查第3至22列,并删除是否满足上述条件。 ie the third row should be removed. 即第三行应删除。

I have tried: my.desired.data <- <- my.data[!apply(my.data[,3:22], 1, function(x) {any(as.character(nchar(x))) != 1}),] which I thought should work, but there appears to be a problem in my code that I can't see for looking at the moment. 我试过了: my.desired.data <- <- my.data[!apply(my.data[,3:22], 1, function(x) {any(as.character(nchar(x))) != 1}),]我认为应该可以使用,但是我的代码中似乎有一个问题,暂时无法看到。

We loop through the columns 3:22 using lapply , check for , using grepl to return logical list elements. 我们循环使用的列3:22 lapply ,检查,使用grepl返回逻辑list元素。 Use Reduce with | |使用Reduce and negate ! 和否定! to return TRUE for rows that have only a single element. 为只有一个元素的行返回TRUE This can be used to subset the 'my.data'. 这可以用于子集“ my.data”。

my.data[!Reduce(`|`,lapply(my.data[3:22], grepl, pattern=',')),]
# alts Coordinate 3115 3124 9582 9583 9584 9585 9586 9587 9588 9590 9592 9593
#324    C   13687520    C    C    C    C    C    C    C    C    C    C    C    C
#325    T   13687570    T    T    T    T    T    T    T    T    T    T    T    T
#    9594 9595 9596 9597 9598 9599 9600 9601
#324    C    C    C    C    C    C    C    C
#325    T    T    T    T    T    T    T    T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM