简体   繁体   中英

How do I split/parse this String properly using Regex

I am inexperienced with regex and rusty with JAVA, so some help here would be appreciated.

So I have a String in the form:

statement|digit|statement

statement|digit|statement

etc.

where statement can be any combination of characters, digits, and spaces. I want to parse this string such that I save the first and last statements of each line in a separate string array.

for example if I had a string:

cats|1|short hair and long hair

cats|2|black, blue

dogs|1|cats are better than dogs

I want to be able to parse the string into two arrays.

Array one = [cats], [cats], [dogs]

Array two = [short hair and long hair],[black, blue],[cats are better than dogs]

    Matcher m = Pattern.compile("(\\.+)|\\d+|=(\\.+)").matcher(str);

        while(m.find()) {
          String key = m.group(1);
          String value = m.group(2);
          System.out.printf("key=%s, value=%s\n", key, value);
        }

I would have continued to add the keys and values into seperate arrays had my output been right but no luck. Any help with this would be very much appreciated.

Here is a solution with RegEx:

public class ParseString {
    public static void main(String[] args) {
        String data = "cats|1|short hair and long hair\n"+
                      "cats|2|black, blue\n"+
                      "dogs|1|cats are better than dogs";
        List<String> result1 = new ArrayList<>();
        List<String> result2 = new ArrayList<>();
        Pattern pattern = Pattern.compile("(.+)\\|\\d+\\|(.+)");

        Matcher m = pattern.matcher(data);
        while (m.find()) {
           String key = m.group(1);
           String value = m.group(2);
           result1.add(key);
           result2.add(value);
           System.out.printf("key=%s, value=%s\n", key, value);
        }
    }
}

Here is a great site to help with regex http://txt2re.com/ expressions. Enter some example text in step one. Select the parts you are interested in part 2. And select a language in step 3. Then copy, paste and massage the code that it spits out.

Double split should work:

class ParseString
{  
  public static void main(String[] args)
  {  
    String s = "cats|1|short hair and long hair\ncats|2|black, blue\ndogs|1|cats are better than dogs";
    String[] sa1 = s.split("\n");
    for (int i = 0; i < sa1.length; i++)
    {  
      String[] sa2 = sa1[i].split("\\|");
      System.out.printf("key=%s, value=%s\n", sa2[0], sa2[2]);
    } // end for i
  } // end main
} // end class ParseString

Output:

key=cats, value=short hair and long hair
key=cats, value=black, blue
key=dogs, value=cats are better than dogs

The main problem is that you need to escape | and not the . . Also what is the = doing in your regex? I generalized the regex a little bit but you can replace .* by \\\\d+ to have the same as you.

Matcher m = Pattern.compile("^(.+?)\\|.*\\|(.+)$", Pattern.MULTILINE).matcher(str);

Here is the strict version: "^([^|]+)\\\\|\\\\d+\\\\|([^|]+)$" (also with MULTILINE)

And it's indeed easier using split (on the lines) as some have said, but like this:

String[] parts = str.split("\\|\\d+\\|");

If parts.length is not two then you know it is not a legal line.

If your input is always formatted like that, then you can just do with this single statement to get the left part in the even indexes and the right part in the odd indexes (0: line1-left , 1: line1-right , 2: line2-left , 3: line2-right , 4: line3-left ...), so you will get an array twice the size of line count.

String[] parts = str.split("\\|\\d+\\||\\n+");

I agree with the other answers that you should use split, but I am providing an answer that uses Pattern.split, since it uses a regex.

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Pattern;

/* Name of the class has to be "Main" only if the class is public. */
class MatchExample
{
    public static void main (String[] args) {
        String[] data = {
            "cats|1|short hair and long hair",
            "cats|2|black, blue",
            "dogs|1|cats are better than dogs"
        };
        Pattern p = Pattern.compile("\\|\\d+\\|");
        for(String line: data){

            String[] elements = p.split(line);
            System.out.println(elements[0] + " // " + elements[1]);

        }
    }
}

Notice that the pattern will match on one or more digits between two |'s. I see what you are doing with the groupings.

There is no need for a complex regex pattern, you could simple split the string by the demiliter token using the string's split method ( String#split() ) on Java.

Working Example

public class StackOverFlow31840211 {
    private static final int SENTENCE1_TOKEN_INDEX = 0;
    private static final int DIGIT_TOKEN_INDEX = SENTENCE1_TOKEN_INDEX + 1;
    private static final int SENTENCE2_TOKEN_INDEX = DIGIT_TOKEN_INDEX + 1;

    public static void main(String[] args) {
        String[] text = {
            "cats|1|short hair and long hair",
            "cats|2|black, blue",
            "dogs|1|cats are better than dogs"
        };

        ArrayList<String> arrayOne = new ArrayList<String>();
        ArrayList<String> arrayTwo = new ArrayList<String>();

        for (String s : text) {
            String[] tokens = s.split("\\|");

            int tokenType = 0;
            for (String token : tokens) {
                switch (tokenType) {
                    case SENTENCE1_TOKEN_INDEX:
                        arrayOne.add(token);
                        break;

                    case SENTENCE2_TOKEN_INDEX:
                        arrayTwo.add(token);
                        break;
                }

                ++tokenType;
            }
        }

        System.out.println("Sentences for first token: " + arrayOne);
        System.out.println("Sentences for third token: " + arrayTwo);

    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM