I want to split a string in java using a regular expression but I want to match it from forward and from behind also for not missing any of the string.
For example:
test <img border=\"0\" src=\"test\" />hi<img border=\\\"0\\\" src=\\\"test\\\" /> test3"
I have the above string and expected output should be:
Expected Output:
test
<img border=\"0\" src=\"test\" />
hi
<img border=\"0\" src=\"test\" />
test3"
Below is what I have tried
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestParse {
private static final String IMG_S_LookBehind = "(?<=\\>)";
private static final String IMG_S_LookAHead = "(?=<img .*?\\>)";
static String test = "test <img border=\"0\" src=\"test\" />hi<img border=\\\"0\\\" src=\\\"test\\\" /> test3";
static Pattern newPattern(String tag) {
return Pattern.compile(String.format("(<%s\\s*([^>]*)>)(.*)(</%s>)", tag, tag));
}
public static void main(String[] args) {
// Pattern re = newPattern("b");
// Matcher m = re.matcher(test);
//
// if (m.matches()) {
// for (int i = 0; i <= m.groupCount(); i++) {
// System.out.printf("[%d]: [%s]\n", i, m.group(i));
// }
// }
String[] split = test.split(IMG_S_LookAHead);
System.out.println(split);
}
}
OUTPUT:
test
<img border=\"0\" src=\"test\" />hi
<img border=\"0\" src=\"test\" /> test3"
I tried looking from behind too but somehow it fails to give me the expected output. Any clue on this will be appreciated.
I wouldn't approach this via a regex split, because it is difficult to phrase/detect boundaries between tags and non-tags etc. Instead, I would try to match either tags, or anything which is not a tag. Here is a working sample script:
String input = "test <img border=\"0\" src=\"test\" />hi<img border=\\\"0\\\" src=\\\"test\\\" /> test3";
String pattern = "<[^>]+>|((?!<[^>]+>).)*";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
while (m.find( )) {
System.out.println(m.group(0));
}
This prints:
test
<img border="0" src="test" />
hi
<img border=\"0\" src=\"test\" />
test3
Perhaps one portion of the regex needs to be explained:
((?!<[^>]+>).)*
This will match anything, so long as it does not encounter the start of a tag. The trick is called "tempered dot," because it is really just .*
with a check at each step to make sure that a tag is not intersected.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.