简体   繁体   中英

Regex issue in R with gsub - reformat string vector to numeric

I am trying to take a character vector of dollar values that is poorly formatted and turn it into a numeric. The values are formatted as in the following vector, with a leading and trailing space, a comma, and a dollar sign:

x <- c(" 18,000.50 ", " $1,240.30 ", " $125.00 ")

I am trying to use the following function to get rid of all characters other than the digits and the dot, but it isn't working:

trim_currency <- function(x) grep("\$([0-9.]*)\,([0-9.]*)", x, values=TRUE)

I got the regex code

\$([0-9.]*)\,([0-9.]*)

to run successfully with this regex tester http://regex101.com/r/qM2uG0

When I run it in R, I get the following error:

Error: '\$' is an unrecognized escape in character string starting "\$"

Any ideas about how I can do this in R?


Thanks to ndoogan for his response. That solves this particular issue. However, if I wanted to make it more general, I would ask:

How could I use R/regex to run a vector through a filter, allowing only the digits and periods to come through?

x <- c(" 18,000.50 ", " $1,240.30 ", " $125.00 ")
gsub("[,$ ]","",x)
#[1] "18000.50" "1240.30"  "125.00"

Add more characters within the brackets to eliminate different things. I assume the example x is exhaustive here.

Update

If you know you're only interested in numeric digits and decimal points, then you could do this:

gsub("[^0-9.]","",x)
#[1] "18000.50" "1240.30"  "125.00"

The ^ inside the square brackets negates the meaning of the statement in square brackets.

Finally, to get resulting values into numeric form, wrap the gsub() function (or an object containing its output) in as.numeric() :

as.numeric(gsub("[^0-9.]","",x))
#[1] 18000.5  1240.3   125.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM