[英]String.split() - matching leading empty String prior to first delimiter?
I need to be able to split an input String by commas, semi-colons or white-space (or a mix of the three). 我需要能够用逗号,分号或空格(或三者的混合)来分割输入字符串。 I would also like to treat multiple consecutive delimiters in the input as a single delimiter.
我还想将输入中的多个连续分隔符视为单个分隔符。 Here's what I have so far:
这是我到目前为止所拥有的:
String regex = "[,;\\s]+";
return input.split(regex);
This works, except for when the input string starts with one of the delimiter characters, in which case the first element of the result array is an empty String. 这是有效的,除了输入字符串以其中一个分隔符开头时,在这种情况下,结果数组的第一个元素是一个空字符串。 I do not want my result to have empty Strings, so that something like, ",,,,ZERO; , ;;ONE ,TWO;," returns just a three element array containing the capitalized Strings.
我不希望我的结果有空字符串,所以像“,,,, ZERO;,;; ONE,TWO;”这样的东西只返回一个包含大写字符串的三元素数组。
Is there a better way to do this than stripping out any leading characters that match my reg-ex prior to invoking String.split? 有没有更好的方法来执行此操作,而不是在调用String.split之前删除与我的reg-ex匹配的任何前导字符?
Thanks in advance! 提前致谢!
No, there isn't. 不,没有。 You can only ignore trailing delimiters by providing 0 as a second parameter to String's split() method:
您只能通过将0作为String的split()方法的第二个参数来忽略尾随分隔符:
return input.split(regex, 0);
but for leading delimiters, you'll have to strip them first: 但对于领先的分隔符,你必须先剥离它们:
return input.replaceFirst("^"+regex, "").split(regex, 0);
If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find
in a loop and pulling out the matches as you find them. 如果“更好”意味着更高的性能,那么您可能想尝试创建一个匹配您想要匹配的正则表达式,并在循环中使用
Matcher.find
在找到它们时拉出匹配。 This saves modifying the string first. 这样可以节省首先修改字符串。 But measure it for yourself to see which is faster for your data.
但要自己测量一下,看看哪个数据更快。
If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split. 如果“更好”意味着更简单,那么我不认为有比您建议的方式更简单的方法:在应用拆分之前删除前导分隔符。
Pretty much all splitting facilities built into the JDK are broken one way or another. 几乎所有JDK内置的拆分工具都以这种或那种方式被破坏。 You'd be better off using a third-party class such as Splitter , which is both flexible and correct in how it handles empty tokens and whitespaces:
你最好使用像Splitter这样的第三方类,它在处理空标记和空格方面既灵活又正确:
Splitter.on(CharMatcher.anyOf(";,").or(CharMatcher.WHITESPACE))
.omitEmptyStrings()
.split(",,,ZERO;,ONE TWO");
will yield an Iterable<String> containing "ZERO", "ONE", "TWO" 将产生一个包含“ZERO”,“ONE”,“TWO”的Iterable <String>
You could also potentially use StringTokenizer to build the list, depending what you need to do with it: 您还可以使用StringTokenizer来构建列表,具体取决于您需要执行的操作:
StringTokenizer st = new StringTokenizer(",,,ZERO;,ONE TWO", ",; ", false);
while(st.hasMoreTokens()) {
String str = st.nextToken();
//add to list, process, etc...
}
As a caveat, however, you'll need to define each potential whitespace character separately in the second argument to the constructor. 但是,作为一个警告,您需要在构造函数的第二个参数中分别定义每个潜在的空白字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.