For example, the source string is "appleappleapplebanana" and pattern I want to delete "appleapple".
I want it to delete all "appleapple" even if they overlap, so that only "banana" is left.
appleappleapplebanana
^^^^^^^^^^ <-first occurrence
^^^^^^^^^^ <-second occurrence
If I use replaceAll, the result is "applebanana" since after deleting the first one, the remaining part is just "applebanana".
Expected results:
Input | Pattern | Result |
---|---|---|
"appleapplebanana" | "appleapple" | "banana" |
"appleapplebanana" | "appleapple" | "banana" |
"appleappleapplebanana" | "appleapple" | "banana" |
"applebanana" | "appleapple" | "applebanana" |
"aaabbbaaabbbaaa" | "aaabbbaaa" | ""(empty string) |
I need to process arbitrary input patterns, so just using replace("apple")
wouldn't work.
Though I have an idea for this:
However, I would like to know if there is a better ( fancier ready made) way to achieve this.
I ended up making my own function using the idea above, since there seems no common libraries nor packages seems to support this feature.
The question was a bit confusing at first. After the updates I think the best provided example to illustrate the problem is matching the "pattern" aaabbbaaa
in aaabbbaaabbbaaa
.
aaabbbaaabbbaaa
aaabbbaaa
aaabbbaaa
^-^ < overlapping part
^-------------^ < match this part: 'aaa' is overlapping
If length of the "pattern"-string may be used in the regex, a lookbehind could be used:
.{1,9}(?<=aaabbbaaa)
This regex (demo) will match from one to the strings length characters as long as aaabbbaaa
is behind. So that will match aaabbbaaa
but also bbbaaa
because the last a
is also preceded by aaabbbaaa
and due to the length restriction it will not skip over any other substring. It will also match non-overlaps in aaabbbaaaaaabbbaaa
but leave eg ccc
in aaabbbaaacccaaabbbaaa
.
A Java demo at tio.run with incorporating the length:
String regex = ".{1," + pat.length() + "}(?<=" + pat + ")";
Pattern p = Pattern.compile(regex);
String result = p.matcher(str).replaceAll("");
For longer inputs it can be more efficient to add a look ahead to start the match and wrap the lookbehind part into a repeated group with at least one repetition:
(?=aaabbbaaa)(?:.{1,9}(?<=aaabbbaaa))+
This can almost double the performance (demo) but is less efficient on shorter strings vs. without . Further you can use \w
(word character) instead of the dot if input contains non-word characters.
Technically, this is over-lapping.
appleapple
appleappleappleapple
appleapple
And, this is repeating.
appleapple
appleapple
appleapple
Although, you could refer to the latter as, having over-lapped .
Which, intrinsically, is not a property of a pattern that is considered to have a repeating quality.
It would be inherent at that point—redundant—it's just a description.
In addition to String#replace there is also String#replaceAll .
It uses a regular expression pattern as the first argument.
You could use the following pattern to replace repeating values that have over-lapped.
(apple)\1+
replaceAll("(apple)\\1+", "")
I'm not sure if there is a way to remove over-lapping values using a single pattern.
I imagine it would be much more complex.
You mentioned "... mark corresponding characters as 'to-be deleted'" .
This would most likely be the logical way to remove truly over-lapping values.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.