A dataset with ISBNs includes some messed up ones with letters - since the only valid letter in an ISBN is an X in the last position, I would like to remove all other letters using gsub - any recommendations?
Following a short example with desired outcomes:
str1 <- 1234X Desired outcome: 1234X
str2 <- 12X34 Desired outcome: 1234
str3 <- XXXXX Desired outcome:
str4 <- 1234B Desired outcome: 1234
Any recommendation?
Another option is to just delete all the non-digits while maintaining the X at the end of a digit number:
gsub("((?<=\\d)X$)|\\D", "\\1", str1, perl = TRUE)
[1] "1234X" "1234" "" "1234"
We could simply use gsub
with 'X' at the end ( $
) of the string to SKIP
while matching one or more upper case letters ( [AZ]+
), and replace it with blank ( ""
)
gsub("X$(*SKIP)(*F)|[A-Z]+", "", str1, perl = TRUE)
#[1] "1234X" "1234" "" "1234"
str1 <- c("1234X", "12X34", "XXXXX", "1234B")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.