简体   繁体   中英

Error trapping with regex

I have the following dataframe

ColumnA=c("Kuala Lumpur Sector 2 new","old Jakarta Sector31",    "Sector 9, 7 Hong Kong","Jakarta new Sector22")

and am extracting the Sector number to a separate column

gsub(".*Sector ?([0-9]+).*","\\1",ColumnA)

Is there a more elegant way to capture errors if 'Sector' does not appear on one line than an if else statement?

If the word 'Sector' does not appear on one line I simply want to set the value of that row to blank.

I thought of using str_detect first to see if 'Sector' was there TRUE/FALSE, but this is quite an ugly solution.

Thanks for any help.

If the word 'Sector' does not appear on one line I simply want to set the value of that row to blank.

To achieve that, use alternation operator | :

ColumnA=c("Kuala Lumpur 2 new","old Jakarta Sector31",    "Sector 9, 7 Hong Kong","Jakarta new Sector22")
gsub("^(?:.*Sector ?([0-9]+).*|.*)$","\\1",ColumnA)

Result: [1] "" "31" "9" "22" (as Kuala Lumpur 2 new has no Sector , the second part with no capturing group matched the whole string).

See IDEONE demo

library(stringr)
as.vector(sapply(str_extract(ColumnA, "(?<=Sector\\s{0,10})([0-9]+)"),function(x) replace(x,is.na(x),'')))

I think this is what you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM