简体   繁体   English

String.split() - 在第一个分隔符之前匹配前导空字符串?

[英]String.split() - matching leading empty String prior to first delimiter?

I need to be able to split an input String by commas, semi-colons or white-space (or a mix of the three). 我需要能够用逗号,分号或空格(或三者的混合)来分割输入字符串。 I would also like to treat multiple consecutive delimiters in the input as a single delimiter. 我还想将输入中的多个连续分隔符视为单个分隔符。 Here's what I have so far: 这是我到目前为止所拥有的:

String regex = "[,;\\s]+";    
return input.split(regex);

This works, except for when the input string starts with one of the delimiter characters, in which case the first element of the result array is an empty String. 这是有效的,除了输入字符串以其中一个分隔符开头时,在这种情况下,结果数组的第一个元素是一个空字符串。 I do not want my result to have empty Strings, so that something like, ",,,,ZERO; , ;;ONE ,TWO;," returns just a three element array containing the capitalized Strings. 我不希望我的结果有空字符串,所以像“,,,, ZERO;,;; ONE,TWO;”这样的东西只返回一个包含大写字符串的三元素数组。

Is there a better way to do this than stripping out any leading characters that match my reg-ex prior to invoking String.split? 有没有更好的方法来执行此操作,而不是在调用String.split之前删除与我的reg-ex匹配的任何前导字符?

Thanks in advance! 提前致谢!

No, there isn't. 不,没有。 You can only ignore trailing delimiters by providing 0 as a second parameter to String's split() method: 您只能通过将0作为String的split()方法的第二个参数来忽略尾随分隔符:

return input.split(regex, 0);

but for leading delimiters, you'll have to strip them first: 但对于领先的分隔符,你必须先剥离它们:

return input.replaceFirst("^"+regex, "").split(regex, 0);

If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find in a loop and pulling out the matches as you find them. 如果“更好”意味着更高的性能,那么您可能想尝试创建一个匹配您想要匹配的正则表达式,并在循环中使用Matcher.find在找到它们时拉出匹配。 This saves modifying the string first. 这样可以节省首先修改字符串。 But measure it for yourself to see which is faster for your data. 但要自己测量一下,看看哪个数据更快。

If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split. 如果“更好”意味着更简单,那么我不认为有比您建议的方式更简单的方法:在应用拆分之前删除前导分隔符。

Pretty much all splitting facilities built into the JDK are broken one way or another. 几乎所有JDK内置的拆分工具都以这种或那种方式被破坏。 You'd be better off using a third-party class such as Splitter , which is both flexible and correct in how it handles empty tokens and whitespaces: 你最好使用像Splitter这样的第三方类,它在处理空标记和空格方面既灵活又正确:

Splitter.on(CharMatcher.anyOf(";,").or(CharMatcher.WHITESPACE))
    .omitEmptyStrings()
    .split(",,,ZERO;,ONE TWO");

will yield an Iterable<String> containing "ZERO", "ONE", "TWO" 将产生一个包含“ZERO”,“ONE”,“TWO”的Iterable <String>

You could also potentially use StringTokenizer to build the list, depending what you need to do with it: 您还可以使用StringTokenizer来构建列表,具体取决于您需要执行的操作:

StringTokenizer st = new StringTokenizer(",,,ZERO;,ONE TWO", ",; ", false);
while(st.hasMoreTokens()) {
  String str = st.nextToken();
  //add to list, process, etc...
}

As a caveat, however, you'll need to define each potential whitespace character separately in the second argument to the constructor. 但是,作为一个警告,您需要在构造函数的第二个参数中分别定义每个潜在的空白字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM