简体   繁体   中英

how to convert numbers with multiple dots inside from character to numeric in R?

I have a vector x as follows.

x= c("44.431.974.113.935", "-0.9780789132588046", "127.136.409.640.697", 
 "-5.510.222.665.234.440", "4.254.952.168.752.070", "0.9009379347023327")

The tricky part is that the first dot is meaningful, but the rest not. So I need to get x back as

[1] 44.43 -0.97 127.13 -5.51 4.25 0.9

I tried with gsub with no success and could not find how to write gsub in a way that it skips the first dot and remove the rest.

There has to be a prettier way, but something like this should work:

gsub("^(.*?[.].*)?[.].*", "\\1", x)
## [1] "44.431"              "-0.9780789132588046" "127.136"            
## [4] "-5.510"              "4.254"               "0.9009379347023327" 

Wrap in as.numeric for numeric values:

round(as.numeric(gsub("^(.*?[.].*)?[.].*", "\\1", x)), 2)
## [1]  44.43  -0.98 127.14  -5.51   4.25   0.90

If you want to remove every dots but the first, a trick could be to replace the first dot with a comma, remove the dots and then replace the comma with a dot. Something like:

sub(",",".",gsub(".","",sub(".",",",x,fixed=TRUE),fixed=TRUE),fixed=TRUE)
#[1] "44.431974113935"     "-0.9780789132588046" "127.136409640697"   
#[4] "-5.510222665234440"  "4.254952168752070"   "0.9009379347023327"

Then you can call as.numeric and round at your wish.

Using str_extract

library(stringr)
as.numeric(str_extract(x, '-*\\d+\\.[0-9]?[1-9]?'))
#[1]  44.43  -0.97 127.13  -5.51   4.25   0.90

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM