简体   繁体   中英

Get key value strings from an input string using regex?

I am trying to find key, value pairs in a string using regex ( not sure if it is wise!) here is my string :

key1=key1 value key2=key2 value_key3=something key3=key3_value

key1 , key2 , key3 are keys. As you can see, the values can have spaces, and wait... If you see value of key2 , it has key3 in it ( key2 value_**key3**=something )! Sorry this is how my input is.

Its not over yet. I can have the keys in any order like below :

key3=key3_value key1=key1 value key2=key2 value_key3=something
key2=key2 value_key3=something key1=key1 value key3=key3_value

Now I want to have a regex that finds me right groups for keys, values so I can later build key value pairs like :

key1=key1 value
key2=key2 value_key3=something
key3=key3_value

I tried the regex key1=(.*)key2=(.*)key3=(.*) , but that works for only first string. If I change the order of keys as in 2nd and 3rd strings , its gone!

Do each key separately:

String key1value = input.replaceAll(".*\\bkey1= *(\\S+).*", "$1");
// similar for other keys

This extracts everything not a space after "key1=". The key3 example in a value is handled due to the word boundary \\b required before key's start.

This might get you started:

(\w+)=((?:(?!\bkey\w+=).)+)

See a demo on regex101.com .

In my opinion, the distinction between key2=key2 value_key3=something and key2=key2 value_key3=something will be the most difficult.
For a better answer, please provide some real input strings.

Maybe this will help you:

\b([a-z\d]+)=(.*?)(?=\b[a-z\d]+=|$)

It's dependant on keys being constructed by alpha-numeric only though. If keys can contain underscores, as the value does in your example, it fails. :( And if keys can contain capital letters, the ignore case flag must be set.

What it does is to capture a key (letters and numbers allowed), match a = and then capture everything up to a new key, or end of line.

Check it out at regex101 .

After some serious thought, this is indeed solvable, a little tricky :

The most important problem I was facing was the order of keys otherwise the regex key1=(.*)key2=(.*)key3=(.*) would have been sufficient.

So I first got the order of keys by collecting them by using Java's indexOf

Then I construct the regex runtime using that order , code below:

List<String> myPropKeys = new ArrayList<String>();
myPropKeys.add("key1");
myPropKeys.add("key2");
myPropKeys.add("key3");

String input1 = "key1=key1 value key2=key2 value_key3=something key3=key3_value";
String input2 = "key3=key3_value key1=key1 value key2=key2 value_key3=something";
String input3 = "key2=key2 value_key3=something key1=key1 value key3=key3_value";

Map<String, String> propMap = getPropValues(input1, myPropKeys);
propMap = getPropValues(input2, myPropKeys);
propMap = getPropValues(input3, myPropKeys);
System.out.println();



private static Map<String, String> getPropValues( String input, List<String> myPropKeys )
{
     Map<String, String> propValues = new HashMap<String, String>();

     StringTokenizer tokens = new StringTokenizer( input );
     List<String> propKeyList = new ArrayList<String>();
     while( tokens.hasMoreTokens() )
     {
         String token = tokens.nextToken();
         int equalsIndex = token.indexOf( "=" );
         if( equalsIndex != -1 )
         {
             String propertyToken = token.substring( 0, equalsIndex );
             if (myPropKeys.contains(propertyToken))
             {
                propKeyList.add( propertyToken );
             }

         }
      }

      StringBuilder sb = new StringBuilder();
      for ( String propKey : propKeyList )
      {
        sb.append( propKey + "=" );
        sb.append( "(.*)" );
      }

      Pattern p = Pattern.compile(sb.toString());
      Matcher m = p.matcher(input);

      List<String> values = new ArrayList<String>();
      if (m.find())
      {
        for ( int i = 1; i <= propKeyList.size(); i++ )
        {
            values.add(m.group(i));
        }

      }

      if ( propKeyList.size() == values.size() )
      {
        for ( int i = 0; i < propKeyList.size(); i++ )
        {
            propValues.put( propKeyList.get(i), values.get(i).trim() );
        }
      }

       return propValues;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM