简体   繁体   English

Java正则表达式-拆分带前导特殊字符的字符串

[英]Java regex - split string with leading special characters

I am trying to split a string that contains whitespaces and special characters. 我正在尝试拆分包含空格和特殊字符的字符串。 The string starts with special characters. 字符串以特殊字符开头。 When I run the code, the first array element is an empty string. 当我运行代码时,第一个数组元素是一个空字符串。

String s = ",hm  ..To?day,.. is not T,uesday.";
String[] sArr = s.split("[^a-zA-Z]+\\s*");

Expected result is ["hm", "To", "day", "is", "not", "T", "uesday"] 预期结果为["hm", "To", "day", "is", "not", "T", "uesday"]

Can someone explain how this is happening? 有人可以解释这是怎么回事吗?

Actual result is ["", "hm", "To", "day", "is", "not", "T", "uesday"] 实际结果是["", "hm", "To", "day", "is", "not", "T", "uesday"]

Split is behaving as expected by splitting off a zero-length string at the start before the first comma. 通过在第一个逗号之前的开始处拆分零长度的字符串,拆分的行为符合预期。

To fix, first remove all splitting chars from the start: 要解决此问题,请先删除所有拆分字符:

String[] sArr = s.replaceAll("^([^a-zA-Z]*\\s*)*", "").split("[^a-zA-Z]+\\s*");

Note that I've altered the removal regex to trim any sequence of spaces and non-letters from the front. 请注意,我已经更改了删除正则表达式,以从正面修剪空格和非字母的任何序列。

You don't need to remove from the tail because split discards empty trailing elements from the result. 您无需从尾部删除,因为split会丢弃结果中的空尾元素。

我将其简化为两步过程,而不是尝试实现纯正则表达式split()操作:

s.replaceAll( '[^a-zA-Z]+', ' ' ).trim().split( ' ' )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM