简体   繁体   中英

Using gsub() to replace all numbers except after certain substrings

Say we have:

TestStrings <- c("Some number < 100", "Some number > 999", "Some number $1000", "Some number 1000000")

I want to replace all numbers with a space except numbers following the substrings:

"< \\d+"   "> \\d+"   "$\\d+"

What Regular expression could I write in function gsub() to complete such a task.

I know the follow code is wrong but here is what I have.

gsub(pattern = "^> \\d+|^< \\d+|^$\\d+", replace = " ", TestStrings)

We can use the following pattern:

[a-z]\s*\K\d+

Here's a Regex Demo .

In it would be:

gsub("[a-z]\\s*\\K\\d+", "", TestStrings, perl = T)

 # [1] "Some number < 100"   "Some number > 999"
 # [3] "Some number $1000"   "Some number "

Perhaps this helps

gsub("[<>] \\d+(*SKIP)(*FAIL)|\\d+", " ", TestStrings, perl = TRUE)
#[1] "Some number < 100" "Some number > 999" "Some number $ "    "Some number  "

If we don't need the $

gsub("[<>] \\d+(*SKIP)(*FAIL)|\\$*\\d+", " ", TestStrings, perl = TRUE)
#[1] "Some number < 100" "Some number > 999" "Some number  "     "Some number  "    

If we need the $ and the numbers

gsub("([<>] |\\$)\\d+(*SKIP)(*FAIL)|\\d+", " ", TestStrings, perl = TRUE)
#[1] "Some number < 100" "Some number > 999" "Some number $1000" "Some number  "    

What about this:

gsub("[<>\\$] ?\\d+", " ", TestStrings)

It returns:

[1] "Some number  "       "Some number  "       "Some number  "       "Some number 1000000"

which I think is what you are looking for.

EDIT Actually you want the opposite, so

gsub("([<>\\$] ?\\d+)|\\d+", "\\1", TestStrings) 
[1] "Some number < 100" "Some number > 999" "Some number $1000" "Some number "

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM