Here is my questions:
I have a very long string with so many values bounded by the different tags. Those values including chinese, english wording and digits.
I wanna to separate by specify pattern. The following is an example: (I want to find a pattern xxxxxx where xxxx is chinese, english, digits or any notation but not include "<" or ">" as those two symbol is for identify the tags)
However, I found some strange for these pattern. The Pattern seems didn't recgonize the first two tag() but the second one
String a = "<f\"number\">4 <f\"number\"><f$n0>14 <h85><f$n0>4 <f$n0>2 <f$n0>2 7 -<f\"Times-Roman\">7<f\"number\">";
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}*\\p{sc=Han}*[a-z]*[A-Z]*[0-9]*^<>]*<f\"number\">");
Matcher m = p.matcher(a);
while(m.find()){
System.out.println(m.group());
}
The output is as same as my String a
The character class [\\\\P{sc=Han}*\\\\p{sc=Han}*[az]*[AZ]*[0-9]*^<>]*
matches 0 or more any character because \\\\P{sc=Han}
and \\\\p{sc=Han}
are opposite.
I guess you want:
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9]*<f\"number\">");
You may want to add spaces:
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9\s]*<f\"number\">");
or:
Pattern p = Pattern.compile("<f\"number\">[^<]*<f\"number\">");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.