简体   繁体   中英

How to split 2 strings using regular expression?

I am trying to split a string into two strings using regular expression

For example

String original1 = "Calpol Plus 100MG";

The above string should split into

String string1 = "Calpol Plus"; and String string2 = "100MG";

I tried using the .split(" ") method on string but it works only if the original string is "Calpol 100MG"

As I am new to regex I searched a few regular expressions and made a regex as "[^0-9MG]" but it still doesn't work on a string like "Syrup 10ML"

I want to use a general regex which would work on both the types of string.

Just split your input according to one or more space characters which was just before to the <number>MG string or <number>ML string.

string.split("\\s+(?=\\d+M[LG])");  // Use this regex "\\s+(?=\\d+(?:\\.\\d+)?M[LG])" if the there is a possibility of floating point numbers.

Example:

String original1 = "Calpol Plus 100MG";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
for (int i=0; i<strs.length; i++) {
  System.out.println(strs[i]);
}

To assign the results to a variable.

String original1 = "Calpol Plus 100MG";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
String string1 = strs[0];
String string2 = strs[1];
System.out.println(string1);
System.out.println(string2);

Output:

Calpol Plus
100MG

Code 2:

String original1 = "Syrup 10ML";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
String string1 = strs[0];
String string2 = strs[1];
System.out.println(string1);
System.out.println(string2);

Output:

Syrup
10ML

Explanation:

  • \\s+ Matches one or more space characters.
  • (?=\\\\d+M[LG]) Positive lookahead asserts that match must be followed by one or more digits \\d+ and further followed by MG or ML

ReGex DEMO

Try something like:

String original1 = "Calpol Plus 100MG";
Pattern p = Pattern.compile("[A-Za-z ]+|[0-9]*.*");
Matcher m = p.matcher(original1);
while (m.find()) {
      System.out.println(m.group());
}

I present two solutions:

  • You can create a pattern that matches the whole String and use groups to extract the desired information
  • You can use look-ahead-assertions to ensure you split in front of a digit

Which solution works best for you depends on the variety of inputs you have. If you use groups you will always find the last amount-part. If you use split you may be able to extract more complex amount-groups like "2 tea-spoons" (with the first solution you would need to extend the [A-Za-z] class to include - egby using [-A-Za-z] instead) or "2.5L" (with the first solution you would need to extend the [0-9] class to include . egby using [0-9.] instead) more easily.

Source:

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Created for http://stackoverflow.com/q/27329519/1266906
 */
public class RecipeSplitter {

    /**
     * {@code ^} the Pattern has to be applied from the start of the String on
     * {@code (.*)} match any characters into Group 1
     * {@code \\s+} followed by at least one whitespace
     * {@code ([0-9]+\s*[A-Za-z]+)} followed by Group 2 which is made up by at least one digit, optional whitespace and
     *                              at least one character
     * {@code $} the Pattern has to be applied so that at the End of the Pattern the End of the String is reached
     */
    public static final Pattern INGREDIENT_PATTERN                   = Pattern.compile("^(.*)\\s+([0-9]+\\s*[A-Za-z]+)$");
    /**
     * {@code \\s+} at least one whitespace
     * {@code (?=[0-9])} next is a digit (?= will ensure it is there but doesn't include it into the match so we don't
     *                   remove it
     */
    public static final Pattern WHITESPACE_FOLLOWED_BY_DIGIT_PATTERN = Pattern.compile("\\s+(?=[0-9])");

    public static void matchWholeString(String input) {
        Matcher matcher = INGREDIENT_PATTERN.matcher(input);
        if (matcher.find()) {
            System.out.println(
                    "\"" + input + "\" was split into \"" + matcher.group(1) + "\" and \"" + matcher.group(2) + "\"");
        } else {
            System.out.println("\"" + input + "\" was not of the expected format");
        }
    }

    public static void splitBeforeNumber(String input) {
        String[] strings = WHITESPACE_FOLLOWED_BY_DIGIT_PATTERN.split(input);
        System.out.println("\"" + input + "\" was split into " + Arrays.toString(strings));
    }

    public static void main(String[] args) {
        matchWholeString("Calpol Plus 100MG");
        // "Calpol Plus 100MG" was split into "Calpol Plus" and "100MG"
        matchWholeString("Syrup 10ML");
        // "Syrup 10ML" was split into "Syrup" and "10ML"
        splitBeforeNumber("Calpol Plus 100MG");
        // "Calpol Plus 100MG" was split into [Calpol Plus, 100MG]
        splitBeforeNumber("Syrup 10ML");
        // "Syrup 10ML" was split into [Syrup, 10ML]
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM