Is it possible to use R's base::strsplit() without consuming pattern

Question

I have a string that consists entirely of simple repeating patterns of a [:digit:]+[AZ] for instance 12A432B4B.

I want to to use base::strsplit() to get:

[1] "12A" "432B" "4B"

I thought I could use lookahead to split by a LETTER and keep this pattern with unlist(strsplit("12A432B4B", "(?<=.)(?=[AZ])", perl = TRUE)) but as can be seen I get the split wrongly:

[1] "12"   "A432" "B4"   "B"

Cant get my mind around a pattern that works with this strsplit strategy? Explanations would be really appreciated.

Bonus : I also failed to use back reference in gsub (eg - pattern not working `gsub("([[:digit:]]+[AZ])+", "\\1", "12A432B4B"), and can you retrieve more than \\1 to \\9 groups, say if [:digit:]+[AZ] repeats for more than 9 times?

Answer 1

We can use regex lookaround to split between an upper case letter and a digit

strsplit(str1, "(?<=[A-Z])(?=[0-9])", perl = TRUE)[[1]]
#[1] "12A"  "432B" "4B"

data

str1 <- "12A432B4B"

Answer 2

The pattern mentioned in the post can be used as it is in str_extract_all :

str_extract_all(string, '[[:digit:]]+[A-Z]')[[1]]
#[1] "12A"  "432B" "4B"

Or in base R:

regmatches(string, gregexpr('[[:digit:]]+[A-Z]', string))[[1]]

where string is:

string <- '12A432B4B'

Is it possible to use R's base::strsplit() without consuming pattern

Question

2 answers

solution1
1 ACCPTED 2020-09-02 21:42:16

data

solution2
1 2020-09-03 01:10:20

Is it possible to use R's base::strsplit() without consuming pattern

Question

2 answers

solution1 1 ACCPTED 2020-09-02 21:42:16

data

solution2 1 2020-09-03 01:10:20

solution1
1 ACCPTED 2020-09-02 21:42:16

solution2
1 2020-09-03 01:10:20