How can I remove all letters from a string but the last if it is "X"?

Question

A dataset with ISBNs includes some messed up ones with letters - since the only valid letter in an ISBN is an X in the last position, I would like to remove all other letters using gsub - any recommendations?

Following a short example with desired outcomes:

str1 <- 1234X Desired outcome: 1234X

str2 <- 12X34 Desired outcome: 1234

str3 <- XXXXX Desired outcome:

str4 <- 1234B Desired outcome: 1234

Any recommendation?

Answer 1

Another option is to just delete all the non-digits while maintaining the X at the end of a digit number:

gsub("((?<=\\d)X$)|\\D", "\\1", str1, perl = TRUE)
[1] "1234X" "1234"  ""      "1234"

Answer 2

We could simply use gsub with 'X' at the end ( $ ) of the string to SKIP while matching one or more upper case letters ( [AZ]+ ), and replace it with blank ( "" )

gsub("X$(*SKIP)(*F)|[A-Z]+", "", str1, perl = TRUE)
#[1] "1234X" "1234"  ""      "1234"

data

str1 <- c("1234X", "12X34", "XXXXX", "1234B")

How can I remove all letters from a string but the last if it is "X"?

Question

2 answers

solution1
1 2021-04-16 18:42:41

solution2
0 2021-04-16 17:57:34

data

How can I remove all letters from a string but the last if it is "X"?

Question

2 answers

solution1 1 2021-04-16 18:42:41

solution2 0 2021-04-16 17:57:34

data

solution1
1 2021-04-16 18:42:41

solution2
0 2021-04-16 17:57:34