简体   繁体   中英

Lookahead with sub-regular expression

I have some data (to be precise this data comes from Windows Registry), which looks like that:

some data ... PACKAGE_SIZE    REG_SZ    100000\r\n    PATH    REG_SZ    C:\\Some\\path\r\n    VERSION    REG_SZ    1.0.0\r\n some other data...

I need to extract the path from it, so I use a regular expression like that:

(?<=(PATH.*?REG_SZ)).+?(?=\\r\\n)

But it doesn't work, as I understand because the lookaround is atomic. So far I'm able to use something like that:

(?<=PATH).+?(?=\\r\\n)

what captures

    REG_SZ    C:\\Some\\path

My question is - is this possible to extract the path in one go? (It means without using two regular expressions)

You can try this way

String data="some data ... PACKAGE_SIZE    REG_SZ    100000\r\n    PATH    REG_SZ    C:\\Some\\path\r\n    VERSION    REG_SZ    1.0.0\r\n some other data";
Pattern p=Pattern.compile("PATH\\s+REG_SZ\\s+(.*)\\r\\n");
Matcher m=p.matcher(data);
if (m.find())
    System.out.println(m.group(1));

output: C:\\Some\\path

Try this

try {
    Pattern regex = Pattern.compile("(?<=PATH\\s{1,10}REG_SZ\\s{1,10})(\\S[^\r\n]+)(?=\r\n)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        // matched text: regexMatcher.group()
        // match start: regexMatcher.start()
        // match end: regexMatcher.end()
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

IMPORTANT:

Assuming number of spaces between PATH , REG_SZ and matched data sould be varied from 1-10.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM