简体   繁体   English

在多个字符上拆分字符串

[英]Split a string on multiple characters

I want to split a sentence on one of the many characters (listed below). 我想在许多字符之一上分割一个句子(下面列出)。 My regex is able to split based on most of the characters but not on '[', ']' (opening and closing square brackets). 我的正则表达式能够基于大多数字符进行拆分,但不能在'[',']'(打开和关闭方括号)上进行拆分。 If I change the string SPECIAL_CHARACTERS_REGEX to [ :;'=\\\\()!-\\\\[\\\\]] , it starts to split on integers in the string rather than splitting the square brackets instead. 如果我将字符串SPECIAL_CHARACTERS_REGEX更改为[ :;'=\\\\()!-\\\\[\\\\]] ,它会开始拆分字符串中的整数,而不是拆分方括号。 How can I make the regex split on square brackets rather than integers ('[]' denotes all integers). 如何将正则表达式分割为方括号而不是整数('[]'表示所有整数)。

Another related question, is there a way to also split numbers from string? 另一个相关的问题,是否还有一种方法可以从字符串中拆分数字? Eg 9pm should be split into 9 and pm . 例如, 9pm应分为9 pmpm

This:

private static final String SPECIAL_CHARACTERS_REGEX = "[ :;'=\\()!-]";
String rawMessage = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"
String[] tokens = rawMessage.split(SPECIAL_CHARACTERS_REGEX);

Gives:

Input: let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]
output: [let, s, meet, tomorrow, at, 9, 30p?, 7, 8pm?, i, you, go, , no, Go, , , [to, do, , ]]

And, 和,

This:

private static final String SPECIAL_CHARACTERS_REGEX = "[ :;'=\\()!-\\[\\]]";
String rawMessage = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"
String[] tokens = rawMessage.split(SPECIAL_CHARACTERS_REGEX);

Gives:
let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]
[let, s, meet, tomorrow, at, , , , , p, , , , , pm, , i, you, go, , no, , o, , , , to, do]

Expected output: 预期产量:

{"let", "s", "meet", "tomorrow", "at", "9", "30", "p", "7", "8", "pm", "i", "you", "go", "no", "Go", "to", "do"}

Put the dash at the end (or beginning or escape it) because otherwise, it will be treated as a range of characters: 将破折号放在末尾(或开始或转义它),否则,它将被视为一系列字符:

[ :;'=\\()!\\[\\]-]

Your original regex was matching all characters between ! 你的原始正则表达式匹配所有字符! and [ which includes numbers, block letters and a bunch of other symbols such as ( , ) and so on. [包括数字,大写字母和一堆其他符号,如()等等。

To get the result you expect, you might use something like this: 要获得您期望的结果,您可能会使用以下内容:

[ ?:;'=\\()!\\[\\]-]+|(?<=\\d)(?=\\D)

(?<=\\d)(?=\\D) is to separate digits and non-digits (or you might also want to use [0-9] and [^0-9] which should be a little more efficient/fast) (?<=\\d)(?=\\D)是分开数字和非数字(或者您可能还想使用[0-9][^0-9]这应该更有效/更快)

ideone demo ideone演示

If you leave the dash in the middle of the character class, you need to escape it also. 如果将短划线留在角色类的中间,则还需要将其转义。

However, avoid this by placing it at the beginning or end of your character class. 但是,请将其放在角色类的开头或结尾处以避免这种情况。 Also you don't need to escape () here, and you possibly want to use a quantifier, either * or + after your character class. 此外,您不需要在这里使用escape () ,并且您可能希望在角色类之后使用量词, *+

Update : To get your expected results, you could do. 更新 :为了获得预期的结果,你可以做到。

private static final String SPECIAL_CHARACTERS_REGEX = "[ :;'?=()!\\[\\]-]+|(?<=\\d)(?=\\D)";
String rawMessage = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]";
String[] tokens = rawMessage.split(SPECIAL_CHARACTERS_REGEX);
System.out.println(Arrays.toString(tokens));

Regular expression: 正则表达式:

[ :;'?=()!\[\]-]+    any character of: ' ', ':', ';', ''', '?',
                       '=', '(', ')', '!', '\[', '\]', '-' (1 or more times)
 |                   OR
  (?<=               look behind to see if there is:
   \d                digits (0-9)
  )                  end of look-behind
   (?=               look ahead to see if there is:
    \D               non-digits (all but 0-9)
   )                 end of look-ahead

See Working demo 参见Working demo

Output 产量

[let, s, meet, tomorrow, at, 9, 30, p, 7, 8, pm, i, you, go, no, Go, to, do]

Using this in the regex will split at any point where a digit is followed by a letter: 在正则表达式中使用它将在数字后跟一个字母的任何点分开:

(?<=\\d)(?=[A-Za-z])

I've tested using just the above in the pattern. 我在模式中使用了上面的测试。 To add it to what you already have, use | 要将其添加到您已有的内容中,请使用| in your regex to split on either the above or what you already have: 在你的正则表达式中拆分上面你已经拥有的:

String[] parts = s.split("[ :;'=()!\\[\\]-]+|(?<=\\d)(?=[A-Za-z])");

(using hwnd's answer). (使用hwnd的答案)。 ?<= is a lookbehind, which matches if the pattern just behind a point matches, and ?= is a lookahead, which matches if the pattern just after a point matches. ?<=是一个lookbehind,如果一个点后面的模式匹配,则匹配,并且?=是一个前瞻,如果一个点之后的模式匹配则匹配。

First introduce space between alpha numeric combinations such as 8pm, then split based the special characters with escape sequence for '[' and ']' : 首先在字母数字组合之间引入空格,例如8pm,然后基于特殊字符拆分,并使用'['和']'的转义序列:

String rawMessage  = "let's meet tomorrow at 9:30pm 7-8pm? i=you go (no Go!) [to do !]";
String rawMessage2 = rawMessage.replaceAll("(?<=[0-9])(?=[a-zA-Z])", " ");
String[] tokens  = rawMessage2.split("[ :;'=()!\\[\\]]+");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM