简体   繁体   中英

how to read string part in java

I have this string :

<meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd">

How can I get lastStop property value in JAVA?

This regex worked when tested on http://www.myregexp.com/

But when I try it in java I don't see the matched text, here is how I tried :

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SimpleRegexTest {
    public static void main(String[] args) {
        String sampleText = "<meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">";
        String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*";
        Pattern p = Pattern.compile(sampleRegex);
        Matcher m = p.matcher(sampleText);
        if (m.find()) {
            String matchedText = m.group();
            System.out.println("matched [" + matchedText + "]");
        } else {
            System.out.println("didn’t match");
        }
    }
}

Maybe the problem is that I use escape char in my test , but real string doesn't have escape inside. ?

UPDATE

Does anyone know why this doesn't work when used in java ? or how to make it work?

(?<=lastStop=[\"']?)[^\"]+

The reason it doesn't work as you expect is because of the * in [^\\"']* . The lookbehind is matching at the position before the " in lastStop=" , which is permitted because the quote is optional: [\\"']? . The next part is supposed to match zero or more non-quote characters, but because the next character is a quote, it matches zero characters.

If you change that * to a + , the second part will fail to match at that position, forcing the regex engine to bump ahead one more position. The lookbehind will match the quote, and [^\\"']+ will match what follows. However, you really shouldn't be using a lookbehind for this in the first place. It's much easier to just match the whole sequence in the normal way and extract the part you want to keep via a capturing group:

String sampleRegex = "lastStop=[\"']?([^\"']*)";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
    String matchedText = m.group(1);
    System.out.println("matched [" + matchedText + "]");
} else {
    System.out.println("didn’t match");
}

It will also make it easier to deal with the problem @Kobi mentioned. You're trying to allow for values contained in double-quotes, single-quotes or no quotes, but your regex is too simplistic. For one thing, a quoted value can contain whitespace, but an unquoted one can't. To deal with all three possibilities, you'll need two or three capturing groups, not just one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM