I have a problem with parsing text, i have transcript of interview and i have a tag which channel is talking (ch1,ch2). And i need to break it into array and i could to search in which channel someone tells specific word.
For example this is a part of interview
<ch1>Hello</ch1> <ch2>Hello</ch2> <ch1>How are you</ch1><ch2>I'm fine</ch2>
This is a string
String text = "<ch1>Hello</ch1> <ch2>Hello</ch2> <ch2>How are you</ch2>
<ch2>I'm fine</ch2>";
And i want output
String output[] = {<ch1>Hello</ch1>,<ch2>Hello</ch2>,....}
Thanks for help.
You can use a regular expression with lookahead and lookbehind :
String dialogue = "<ch1>Hello</ch1> <ch2>Hello</ch2> <ch1>How are you</ch1><ch2>I'm fine</ch2>";
String[] statements = dialogue.split("(?<=</ch[12]>)\\s*(?=<ch[12]>)");
System.out.println(Arrays.asList(statements));
Output:
[<ch1>Hello</ch1>, <ch2>Hello</ch2>, <ch1>How are you</ch1>, <ch2>I'm fine</ch2>]
It's a bit hard to read due to the many <
and >
, but the pattern is like this:
split("(?<=endOfLastPart)inBetween(?=startOfNextPart)")
text.split("<ch").join("-<ch").split("-").
可以是任何字符串而不是“ - ”可以使用。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.