简体   繁体   中英

What would be the regex for this pattern?

My Java program, in certain point, receives a string containing a couple of key-value properties like this example:

param1=value Param2=values can have spaces PARAM3=values cant have equal characters

The parameters' name/key are composed by a single word (az, AZ, _ and 0-9) and are followed by an = character (not separated by spaces) and it's value. The value is a text that can contain spaces and last until the end of the string or the begin of another parameter. (which is a word followed by equals and it's value, etc.)

I need to extract a Properties object (string-to-string map) from this string. I was trying to use regex to find each key-value set. The code is like this:

public static String createProperties(String str) {
    Properties prop = new Properties();
    Matcher matcher = Pattern.compile(some regex).match(str);

    while (matcher.find()) {
        String match = matcher.group();
        String param = ...; // What comes before '='
        String value = ...; // What comes after '='
        prop.setProperty(param, value);
    }

    return prop;
}

But the regex wrote is not working correctly.

String regex = "(\\w+=.*)+";

Since .* tells the regex to get "anything" it found, it will match the entire string. I want to tell the regex to search until it finds another \\\\w=.* . (word followed by equals and something after)

How could I write this regex? Or what would be another solution for the problem using regex?

You can use a Negative Lookahead here.

(\\w+)=((?:(?!\\s*\\w+=).)*)

The key is placed inside capturing group #1 and the value is in capturing group #2 . Note that I used \\s inside the lookaround in order to prevent the value from having trailing whitespace.

Live Demo

One way among several:

List<String> paramNames = new ArrayList<String>();
List<String> paramValues = new ArrayList<String>();
Pattern regex = Pattern.compile("([^\\s=]+)=([^\\s=]+)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
        paramNames.add(regexMatcher.group(1));
        paramValues.add(regexMatcher.group(2));
    } 

The regex:

([^\\s=]+)=([^\\s=]+)

The code retrieves keys as Group 1, values as Group 2.

Explanation

  • ([^\\\\s=]+) captures any chars that are not a whitespace or an equal to Group 1
  • = matches the literal =
  • ([^\\\\s=]+) captures any chars that are not a whitespace or an equal to Group 2

Your regex would be,

(\\w+=(?:(?!\\w+=).)*)

DEMO

It captures the param=value pair upto the next param= . It captures three param=value pair into three separate groups.

Explanation:

  • \\\\w+= Matches one or more word characters followed by an = symbol.
  • (?:(?!\\\\w+=).)* A non-capturing group and a negative lookahead is used to match any characters not of characters in this \\w+= format. So it captures upto the next param=

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM