简体   繁体   中英

How can I remove all letters from a string but the last if it is "X"?

A dataset with ISBNs includes some messed up ones with letters - since the only valid letter in an ISBN is an X in the last position, I would like to remove all other letters using gsub - any recommendations?

Following a short example with desired outcomes:

str1 <- 1234X Desired outcome: 1234X

str2 <- 12X34 Desired outcome: 1234

str3 <- XXXXX Desired outcome:

str4 <- 1234B Desired outcome: 1234

Any recommendation?

Another option is to just delete all the non-digits while maintaining the X at the end of a digit number:

gsub("((?<=\\d)X$)|\\D", "\\1", str1, perl = TRUE)
[1] "1234X" "1234"  ""      "1234" 

We could simply use gsub with 'X' at the end ( $ ) of the string to SKIP while matching one or more upper case letters ( [AZ]+ ), and replace it with blank ( "" )

gsub("X$(*SKIP)(*F)|[A-Z]+", "", str1, perl = TRUE)
#[1] "1234X" "1234"  ""      "1234" 

data

str1 <- c("1234X", "12X34", "XXXXX", "1234B")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM