简体   繁体   中英

Discrepancy with R regular expression match

I can't figure out why this R regular expression doesn't match both of these two strings. As I understand it, this expression should match any string with the same lower-case letter appearing twice within the string. It matches one of the strings ("Moro"), but not the second ("moro"), even though both strings contain a repeated lower-case "o". What's going on here?

Executed in R (3.4.3):

grep("([az]).*\\\\1", c("Moro", "moro"), value=TRUE)

[1] "Moro"

The same thing occurs with this regex, which I believe is identical to the one above:

grep("([[:lower:]]).*\\\\1", c("Moro", "moro"), value=TRUE)

[1] "Moro"

Thanks for any help!

This seems to be a regex flavor issue. If you set perl = T , it works:

grep("([a-z]).*\\1", c("Moro", "moro", "mora"), value=TRUE, perl = T)
# [1] "Moro" "moro"

Worth noting that stringr and stringi work out-of-the-box:

stringr::str_detect(c("Moro", "moro", "mora"), "([a-z]).*\\1")
# [1]  TRUE  TRUE FALSE

stringi::stri_detect(c("Moro", "moro", "mora"), regex = "([a-z]).*\\1")
# [1]  TRUE  TRUE FALSE

I'm not sure but my guess is because it tries to match any character. If you use simple [o] it will work:

grep("([a-z]).*\\1", c("Moro", "moro"), value=TRUE)
# [1] "Moro"
grep("([o]).*\\1", c("Moro", "moro"), value=TRUE)
# [1] "Moro" "moro"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM