简体   繁体   中英

How to split a string and keep specific delimiters?

I was writing some code which needed to accept user calculator input, so as part of it I figured I'd use regular expressions to tokenize an input string, but tokenizing the string itself fails my unit tests for decimals and "]".

I started by using the lookahead and lookbehind method that I saw here .

I wrote with "((?<=[+-/*(){^}[%]π])|(?=[+-/*(){^}[%]π]))"; which compiled and ran successfully, except it failed if there was a number with a decimal.

I went back and I tried it the same way the accepted answer does in the linked question using "[+-/*\\\\^%(){}[]]" (regex3 below) both with and without the π because my first instinct would be the character which caused the issue, but in both cases it resulted in Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 41 ((?<=[+-/*\\^%(){}[]])|(?=[+-/*\\^%(){}[]]))

At this point, I went back to my first try and rearranged the terms, "((?<=[+-/*^%(){}[]π])|(?=[+-/*^%(){}[]π]))"; (regex2 below) but this one also had the same PatternSyntaxException on the last parenthesis.

It'd probably be easier to just show the problem in code, I wrote a class to run three different regex class attempts :

import java.util.Arrays;
public class RegexProblem {
    /** This Delimiter string came from {@link https://stackoverflow.com/a/2206432/} */
    static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";


    // Split on and include + - * / ^ % ( ) [ ] { } π
    public static void main(String[] args) {

        String regex1="((?<=[+-/*(){^}[%]π])|(?=[+-/*(){^}[%]π]))";
        String regex2="((?<=[+-/*^%(){}[]π])|(?=[+-/*^%(){}[]π]))";
        String regex3="[+-/*\\^%(){}[]]";

        String str="1.2+3-4^5*6/(78%9π)+[{0+-1}*2]";
        String str2="[1.2+3]*4";


        String[] expected={"1.2","+","3","-","4","^","5","*","6","(","78","%",
                           "9","π",")","+","[","{","0","+","-","1","}","*","2","]"};
        String[] expected2={"[","1.2","+","3","]","*","4"};


        System.out.println("Expected: ");
        System.out.print("str: ");
        System.out.println(Arrays.toString(expected));
        System.out.print("str2: ");
        System.out.println(Arrays.toString(expected2));
        System.out.println();


        System.out.println();
        System.out.println("Regex1: ");
        System.out.print("str: ");
        System.out.println(Arrays.toString(str.split(regex1)));
        System.out.print("str2: ");
        System.out.println(Arrays.toString(str2.split(regex1)));
        System.out.println();
        System.out.println("Regex2: ");
        System.out.print("str: ");
        System.out.println(Arrays.toString(str.split(regex2)));
        System.out.print("str2: ");
        System.out.println(Arrays.toString(str2.split(regex2)));
        System.out.println();
        System.out.println("Regex3: ");
        System.out.print("str: ");
        System.out.print(Arrays.toString(str.split(String.format(WITH_DELIMITER, regex3))));
        System.out.print("str2: ");
        System.out.print(Arrays.toString(str2.split(String.format(WITH_DELIMITER, regex3))));

    }

}

Running regex2 and regex 3 both failed, but what baffles me is the behavior of regex1, which will run even though it appears to have the same amount of closing characters as the others, and splits using "." but not "]".

Try this:

(?<=[^\\d.])|(?=[^\\d.])

Explanation:

  • \\d is shorthand for [0-9] , so any numeral.
  • . within square brackets just matches a literal dot, which appears to always be part of a number in your example input. Therefore, [\\d.] is what we'll use to identify number characters.
  • [^\\d.] matches a non-number character (carat ^ negates a character class).
  • (?<=[^\\d.]) matches a point that's preceded by a non-number character.
  • Alternate (?=[^\\d.]) matches a point that's followed by a non-number character.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM