简体   繁体   中英

How to check whether a column contains only identical elements in R?

Example data:

x <- matrix(c("Stack","Stack","Stack",
              "Overflow","Overflow","wolfrevO"),
            nrow=3,ncol=2)

How can I check whether x[,1] contains entirely identical elements?

If x contains NA s, does this method still apply?

Thanks

You can compare the vector's first value to the rest of the vector.

all(x[-1, 1] == x[1, 1])
# [1] TRUE

If NA values are present, then this exact method does not still apply. However, it can be easily rectified by using na.omit() . For example -

## create a vector with an NA value
x2 <- c(x[, 1], NA)

## standard check returns NA
all(x2 == x2[1])
# [1] NA

## call na.omit() to remove, then compare
all(na.omit(x2) == x2[1])
# [1] TRUE

So then, with your matrix x , this last line would become

all(na.omit(x[-1, 1]) == x[1, 1])

You count the unique elements of the column:

length(unique(x[,1]))==1

works even if there are NA's in your data.

For checking every column use:

apply(x, 2, function(a) length(unique(a))==1)

You can use the duplicated function for this:

if sum(!duplicated(x[,1]))==1 returns TRUE the column contains all identical values.

sum(!duplicated(x[,1]))==1
[1] TRUE

sum(!duplicated(x[,2]))==1
[1] FALSE

If x contains NAs this method will work, in the sense that all NA columns will return TRUE and mixed columns will return FALSE .

x <- matrix(c(NA,NA,NA,"Overflow","Overflow",NA),nrow=3,ncol=2)

sum(!duplicated(x[,2]))==1
[1] FALSE

sum(!duplicated(x[,1]))==1
[1] TRUE

If you want to see which elements are duplicated and how many times you can use table .

table(x[,1])
# Stack 
# 3 

table(x[,2])
# Overflow wolfrevO 
#    2        1 

To see if there's only one unique value in a column, use dim .

dim(table(x[,1])) == 1
# [1] TRUE

I agree with @Richard Scriven for characters, factors, etc ( all(x[-1, 1] == x[1, 1]) ).

For comparing numeric values, however, a more robust approach can be useful:

all.same <- function (x) {
    abs(max(x) - min(x)) < 8.881784e-16
    # the constant above is just .Machine$double.eps*4
}
apply(x, 2, all.same)

A comparison of the proposed methods:

x <- rep(1, 1000)
x[5] <- 0

microbenchmark::microbenchmark(
  all(duplicated(x)), 
  length(unique(x)) == 1, 
  dim(table(x)) == 1, 
  all(x == x[1]),
  times = 1000)

Unit: microseconds
                   expr      min       lq        mean   median       uq      max neval cld
     all(duplicated(x))   19.594   21.461   24.688356   22.861   24.727   74.646  1000  b 
 length(unique(x)) == 1   21.461   23.793   26.972993   25.193   26.127  156.755  1000  b 
     dim(table(x)) == 1 1067.422 1090.282 1144.309131 1123.872 1154.197 2072.795  1000   c
         all(x == x[1])    3.267    4.199    4.629929    4.200    4.666   22.394  1000 a  

x is a column or a row. Matrix , data.frame or similar per row or column equality testing can be done with:

all(apply(X, 1, function(x){all(x == x[1])}))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM