简体   繁体   中英

How to extract parts of a particular string with a regular expression?

I'm trying to solve the following exercise for test driven development from this link http://osherove.com/tdd-kata-1/ and I'm stuck just near the end of the requirements.

I've always dreaded regular expressions but it seems that I'll have to learn them. Anyway, I'm trying to do the following: - take a string, extract the numbers from it and sum them. The requirement that's troubling me is that one

Allow multiple delimiters like this: “//[delim1][delim2]\\n” for example “//[*][%]\\n1*2%3” should return 6. Make sure you can also handle multiple delimiters with length longer than one char.

The requirement means that I'll have to extract delim1 , delim2 , etc. from the string beginning with // and ending with a new line symbol \\n and then I'll need to use these delimiters and extract the numbers after the \\n . Each delimiter is surrounded with square brackets.

Now, how can I do that in java with a regular expression?

What I have up till now is the following code that covers the requirements in the above link:

import java.util.ArrayList;

public class Calculator {

    public String getDelimiter(String input) {
        String delimiter = "";
        String changeDelimiter = input.split("\\n")[0];
        delimiter = changeDelimiter.substring(2);
        return delimiter;
    }

    public int calculate(String input) {
        String[] numbers;

        if (input.contains("//")) {
            String delimiter = getDelimiter(input);
            System.out.println("aaaaaaaaaaaaaaaaaaaaaaa : " + delimiter); //testing the value
            String calculation = input.split("\\n")[1];
            numbers = calculation.split("[" + delimiter + "]+");
            System.out.println("bbbbbbbbbbbbbbbbbbbbbbbb"); //testing the values
            for (String number : numbers) {
                System.out.print(number + ":");
                // System.out.print(Integer.parseInt(number) + " ");
            }

        } else
            numbers = input.split(",|\\n");

        if (input.isEmpty()) {
            return 0;
        }
        if (input.length() == 1) {
            return Integer.parseInt(input);
        }
        else {
            return getSum(numbers);
        }
    }

    private int getSum(String[] numbers) throws IllegalArgumentException {
        int sum = 0;
        ArrayList<Integer> negatives = new ArrayList<Integer>();
        for (int i = 0; i < numbers.length; i++) {
            if (Integer.parseInt(numbers[i]) < 0) {
                negatives.add(Integer.parseInt(numbers[i]));
            }
            if (Integer.parseInt(numbers[i]) >= 1000) {
                continue;
            } else
                sum += Integer.parseInt(numbers[i]);
        }
        if (negatives.isEmpty()) {
            return sum;
        } else {
            String negativeNumbers = "";
            for (Integer number : negatives) {
                negativeNumbers += number.toString() + " ";
            }
            throw new IllegalArgumentException("Negatives not allowed : " + negativeNumbers);
        }

    }

}

You can use regex

\\d matches a single digit

+ is a quantifier which matches preceding pattern 1 to many times

So \\d+ would match 1 to many digits


Your code would be

public int addAllInts(String s)
{
    int temp=0;
    Matcher m=Pattern.compile("\\d+").matcher();
    while(m.find())
    {
        temp+=Integer.parseInt(m.group());
    }
    return temp;
}

This is longer than just matching any numbers but it should work for delimiters like "delim1", ie delimiters containing numbers. I tried to explain the patterns and steps inline.

    final String input = "//[delim1][delim2]\n12delim125delim2";
    // split the input string so you will get anything after // and before \n
    // and anything after \n until end of line ($)
    Pattern p = Pattern.compile("^//(.+)\\n(.*)$");
    Matcher m = p.matcher(input);
    if (!m.matches()) {
      System.out.println("Input string not valid");
      return;
    }

    String delimString = m.group(1);
    String searchString = m.group(2);

    // This matches the opening square bracket,
    // then as a capturing group, anything except a closing bracket. 
    // Finally it matches the closing bracket of the delimiter definition.
    Pattern pDelim = Pattern.compile("\\[([^\\]]+)\\]");
    Matcher mDelim = pDelim.matcher(delimString);

    // build a regex for String.split in the format: delim1|delim2|delim3|...
    String delimiters = "";
    while (mDelim.find()) {
     delimiters += (Pattern.quote(mDelim.group(1)) + "|");
    }
    delimiters = delimiters.substring(0, delimiters.length()-1);

    // split string and convert numbers to integers, then sum them up
    String[] numStrings = searchString.split(delimiters);
    int sum = 0;
    for (String num : numStrings) {
      sum += Integer.parseInt(num);
    }

    System.out.println("Sum: " + sum);

Edit / some more explanation

The regular expression \\\\[([^\\\\]]+)\\\\] contains three parts:

  • "\\\\[" : this will match the opening square brackets of the delimiter definition. Notice the two backslashes which are necessary, because one would be interpreted by the Java compiler. However, we want to match [ which is a special character in regex as well. So we need two of them.
  • ([^\\\\]]+) : The outer parentheses create a so called capturing group, you can later access using Matcher.group(n) where n is the index of the group. So 1 would be the first group defined, 2 the second group, and so on. 0 returns the whole matching string.

    • [^\\\\]]+ : This regex will match the content of the delimiter definition, that is everything inside the square brackets. This time, the outer [ and ] are not escaped. They have a special meaning and define a character class. A character class will match any character specified inside of it. For example [abc] would match a or b or c but not d . The ^ at the beginning of a character class has a special meaning, it inverts the character class. So [^abc] would match any character except for a , b or c .

      The only character defined in our character class is ] , so the character class will match any character except for the closing square bracket, which shall end the delimiter definition. The + appended to the character class means: match at least 1 character, or more if possible.

  • \\\\] : Simply match the closing square bracket.

With this regex we receive the delimiter strings by invoking Matcher.find() and Matcher.group(1) . String.split() takes a regex as well for its delimiter parameter. So now we need to build a regex that matches any delimiter string we parsed before. Pattern.quote() is used to escape the delimiter strings. This might be necessary if a delimiter contains a special character that would be interpreted by the regex otherwise. | is such a special character which is an or . The whole regex string we build will match any delimiter string. Therefore String.split() will split the strings on our delimiters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM