简体   繁体   中英

Why does dplyr filter_all using all_vars > 0 work on a character string?

Given:

df <- structure(list(word = c("aaliyahmaxwell", "abasc", "abbslovesfed", 
"abbycastro", "abc", "abccarpet", "abdul", "ability", "abnormile", 
"abraham"), chardonnay = c(4, 0, 0, 0, 0, 0, 0, 0, 0, 0), coffee = c(0, 
1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("word", "chardonnay", 
"coffee"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

Why does df %>% filter_all(all_vars(. > 0)) work?

I mean that my first column is of type character and can't be > 0. I can understand why it works on the other two columns but need an explanation on why it works when I have a mixture of character and double type columns.

Please advise.

It is due to type change. Here, 0 a numeric entry gets type converted to a character one. According to `?Comparison

If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.

df %>%
   filter(word > 0)

giving all the rows of the original data because

letters > 0
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#[26] TRUE

In the 'word' column, it is all characters which would any way be greater than "0" due to type conversion, leaving only the all_vars to essentially check whether the other numeric columns are greater than 0


In the OP's dataset example, none of the rows match the criteria because one of the numeric columns is always less than or equal to 0 in each of the rows. If we change the first row of 'coffee' to 2 or 1, that row would be picked up because the 'chardonnay' is greater than 0, the first column 'word' is always greater

df$coffee[1] <- 2
df %>%
    filter_all(all_vars(. > 0))
# A tibble: 1 x 3
#  word           chardonnay coffee
#   <chr>               <dbl>  <dbl>
#1 aaliyahmaxwell          4      2

To select only numeric columns, use filter_if (as in the comments)

df %>% 
   filter_if(is.numeric, all_vars(. > 0))

Even though there is already a good answer, I think this can be made clearer with an example:

> c("a", 0)
[1] "a" "0"

Here you can see what happens, the number gets coerced to a character.

Characters get compared lexically. Example:

> "b" > "a" 
[1] TRUE

> "a" > "5"
[1] TRUE

> charvector <- sample(c(seq(1,9), LETTERS))
> charvector
 [1] "6" "D" "T" "U" "I" "R" "F" "S" "J" "W" "B" "A" "8" "E" "2" "7" "O" "Z" "V" "G" "9" "4" "H" "C" "Y" "1" "X" "5" "M" "K" "Q" "L" "N" "3" "P"

The order becomes also clear when you sort that vector:

> sort(charvector)
 [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM