简体   繁体   中英

Strange Greater Than Logic with Numbers in Character Vector

I was working on a script today, and noticed some very unexpected outputs. Upon inspection, I found that one variable in my dataset, which should always be numeric, has one character value (essentially one cell with a typed "N/A" rather than a value properly read in as NA). This is not really a problem, as I can manually re-code this value as NA. What I am curious about is why I did not receive an error while indexing on this vector, and how to interpret the output. An example is provided below:

c("56.2", "84.7", "63", "9", "109.5", "16", "N/A", "50") >= 50

Results in the output:

TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE

The logic behind which entries are marked as TRUE or FALSE is not immediately obvious to me. Could anyone provide an explanation?

Because a comparison on characters is done on alphabetical order and numbers come before letters, "100.9" starts with a 1 so comes first than the 5 in "50" and therefor "smaller" / earlier in order.

"ab" > "b"
# a comes before b
# [1] FALSE

"12" > "2"
# 1 comes before 2 as character
# [1] FALSE

Additional note to the explanation by Merjin van Tilborg:

x <- c("56.2", "84.7", "63", "9", "109.5", "16", "N/A", "50") 
x >= 50

# gives
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE

# Now check which indexes fulfill this comparison (the why is explained by Merjin van Tilborg)
which(x >= 50)
[1] 1 2 3 4 7 8

# if you do like this:
as.numeric(x) >=50

# you get:
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE    NA  TRUE
Warning message:
  NAs introduced by coercion

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM