简体   繁体   中英

Extract a sub string between : and WORD in java using regex in java

I am trying to extract text between semi colon (;) and WORD. i am using below code but unable to extract "TVS A3003" using below code.

Matcher matcher = Pattern.compile("(?<=;).*?(?=WORD)").matcher(string);

Three Sample strings :

1. (XYZTRR: KTTT 4.0.1; TVS A3003 WORD/LLLLL ; pj ;) 

2. (XcdcdRR; dTff 5.4.1; TVS A3003 WORD/UJH;KKKHH fpp) 

3. LLLhf22; 776332 8.7.1; TVS A3003 WORD/UHHGFVV phhp

4. (;LLLhf22; 776332 8.7.1; TVS A3003 WORD/UHHGFVV phhp ;)

I want to extract TVS A3003 in all the cases.

You need to find a ; and then match any 0+ chars other than ;as few as possible up to the first occurrence of WORD . You may do that using

;([^;]*?)WORD

See the regex demo . Note that the leading/trailing whitespace can be easily trimmed off with .trim() after a match is found.

See the Java demo below:

List<String> strs = Arrays.asList("(XYZTRR: KTTT 4.0.1; TVS A3003 WORD/LLLLL ; pj ;)", 
        "(XcdcdRR: dTff 5.4.1; TVS A3003 WORD/UJHKKKHH fpp)",
        "(LLLhf22; 776332 8.7.1; TVS A3003 WORD/UHHGFVV phhp) );");
Pattern pattern = Pattern.compile(";([^;]*?)WORD");
while (String s : strs) {
    Matcher matcher = pattern.matcher(s);
    if (matcher.find()){
        System.out.println(matcher.group(1).trim()); 
    } 
}

Output:

TVS A3003
TVS A3003
TVS A3003

reg ex is (?<=KTTT 4\\.0\\.1; )(.*)(?= WORD/U)

Matcher matcher = Pattern.compile("(?<=KTTT 4\\.0\\.1; )(.*)(?= WORD/U)").matcher(string);

if(matcher.find()){
     System.out.println(matcher.group());
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM