简体   繁体   中英

Extract valid numbers from character vector in R

Suppose I have the below character vector

c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")

Now I want to extract only valid numbers which are in the above vector:

c("4", "-21", "6.5", "-2.2")

note: one space in between . and 5 in 7. 5 so not a valid number.

I was trying with regex /^-?(0|[1-9]\\d*)(\\.\\d+)?$/ which is given here but no luck.

So what would be the regex to extract valid numbers from a character vector?

as.numeric already does a great job of this. Anything that's a valid number can be successfully coerced to numeric, everything else is NA .

x = c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")
y = as.numeric(x)
y = y[!is.na(y)]
y
# [1]   4.0 -21.0   6.5  -2.2

We can use grep that matches digits with . from the start ( ^ ) till the end ( $ ) of the string

grep("^-?[0-9.]+$", v1, value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2"

Or for fringe cases

grep("^[ -]?[0-9]+(\\.\\d+)?$", c(v1, "4.1.1"), value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2"

grep("^[ -]?[0-9]+(\\.\\d+)?$", c(v1, "4.1.1", " 2.9"), value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2" " 2.9"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM