简体   繁体   中英

Java : Pattern matcher returns new lines unexpectedly

I have an use case that I have to handle any escaped/unescaped characters as delimiter to split a sentence. So far the unescaped/escaped character we have are :

" " (space),"\\t","|", "\\|",";","\\;","," etc

Which is working so far with a regex, defined as :

String delimiter = " ";
String regex = "(?:\\\\.|[^"+ delimiter +"\\\\]++)*";

The input string is :

String input = "234|Tamarind|something interesting ";

Now, below is the code that splits and prints:

 List<String> matchList = new ArrayList<>(  );
 Matcher regexMatcher = pattern.matcher( input );
 while ( regexMatcher.find() )
 {
     matchList.add( regexMatcher.group() );
 }

 System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
 matchList.stream().forEach( System.out::println );

However, there are extra strings(new lines) that are being stored unexpectedly. So the output looks like :

Unescaped/escaped test result with size: 5
234|Tamarind|something

interesting

.

Is there a better way to do this so that there won't be any extra strings?

It is easy: make sure you match at least one character. That means you may remove the ++ quantifier and replace * with + . See the regex demo .

Full Java demo :

String delimiter = " ";
String regex = "(?:\\\\.|[^"+ delimiter +"\\\\])+";
// System.out.println(regex); // => (?:\\.|[^ \\])+
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
String input = "234|Tamarind|something interesting ";
List<String> matchList = new ArrayList<>(  );
Matcher regexMatcher = pattern.matcher( input );
while ( regexMatcher.find() )
{
    // System.out.println("'"+regexMatcher.group()+"'");
    matchList.add( regexMatcher.group() );
}

System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
matchList.stream().forEach( System.out::println );

Ouput:

Unescaped/escaped test result with size: 2
234|Tamarind|something
interesting

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM