简体   繁体   中英

Grouping regular expression

Here is my questions:

I have a very long string with so many values bounded by the different tags. Those values including chinese, english wording and digits.

I wanna to separate by specify pattern. The following is an example: (I want to find a pattern xxxxxx where xxxx is chinese, english, digits or any notation but not include "<" or ">" as those two symbol is for identify the tags)

However, I found some strange for these pattern. The Pattern seems didn't recgonize the first two tag() but the second one

String a = "<f\"number\">4  <f\"number\"><f$n0>14   <h85><f$n0>4    <f$n0>2 <f$n0>2 7   -<f\"Times-Roman\">7<f\"number\">";
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}*\\p{sc=Han}*[a-z]*[A-Z]*[0-9]*^<>]*<f\"number\">");
Matcher m = p.matcher(a);

while(m.find()){
    System.out.println(m.group());
}

The output is as same as my String a

The character class [\\\\P{sc=Han}*\\\\p{sc=Han}*[az]*[AZ]*[0-9]*^<>]* matches 0 or more any character because \\\\P{sc=Han} and \\\\p{sc=Han} are opposite.

I guess you want:

Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9]*<f\"number\">");

You may want to add spaces:

Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9\s]*<f\"number\">");

or:

Pattern p = Pattern.compile("<f\"number\">[^<]*<f\"number\">");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM