简体   繁体   English

查找唯一行时忽略数据帧中的NA

[英]Ignoring NAs in a dataframe when finding unique rows

I have a dataframe with 20 columns and about 200 rows, and I would like to find the unique rows. 我有一个包含20列和大约200行的数据框,我想找到唯一的行。 The problem is that nearly every row has a few NAs mixed in: this is really missing data and I would like the NAs to be treated like a "wildcard", not used to match other NAs. 问题是几乎每行都有几个NA混合在一起:这确实是数据丢失,我希望将这些NA视为“通配符”,而不是用来匹配其他NA。

The following two rows should be recognized as a match (ie non-unique) 以下两行应被视为匹配项(即非唯一)

T, S, NA, Z
NA, S, G, Z

I've tried the incomparables argument to the unique function, but it doesn't seem to be implemented. 我已经尝试了无与伦比的参数到唯一函数,但是它似乎没有实现。 Thanks a lot. 非常感谢。

Put this in a double for loop: 将其放入double for循环中:

all(na.omit(x[1,] == x[2,]))

Replacing 1 and 2 with i and j to cycle through all comparisons you need to check. 用i和j替换1和2以循环显示您需要检查的所有比较。

You could try 你可以试试

val <-  apply(df, 1, function(x) {paste(na.omit(x), collapse='')})
df[!duplicated(val),]
#    V1 V2   V3 V4
#1    T  S <NA>  Z
#2 <NA>  S    G  Z
#3    S  G    Z  T

data 数据

 df <- structure(list(V1 = c("T", NA, "S", "S", "S"), V2 = c("S", "S", 
 "G", NA, "G"), V3 = c(NA, "G", "Z", "Z", NA), V4 = c("Z", "Z", 
 "T", "G", "Z")), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 
 -5L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM