[英]How to Split a String on mathematical operators as delimiters but escape operators inside quotes (in java)?

For example, 例如,

AM2 + 'G - D08 - 28 - 14 .xlsx]General Inputs' should be split as AM2 + 'G - D08 - 28 - 14 .xlsx]General Inputs'应拆分为

AM2 and 'G - D08 - 28 - 14 .xlsx]General Inputs' . AM2'G - D08 - 28 - 14 .xlsx]General Inputs'

For your type of given input example, I would probably match vs splitting. 对于给定输入示例的类型,我可能会匹配vs拆分。

String s  = "AM2 + 'G - D08 - 28 - 14 .xlsx]General Inputs'";
Pattern p = Pattern.compile("'[^']*'|[^ '+*/-]+");
Matcher m = p.matcher(s);
while (m.find()) {

Output 输出量

'G - D08 - 28 - 14 .xlsx]General Inputs'

I don't think you can do this with split --if you can, it would be very tricky and messy. 我认为您无法使用split进行此操作-如果可以的话,这将非常棘手且混乱。 split is good at looking for delimiters, but not so good when a pattern has to be applied to the stuff in between delimiters, which it would in this case. split擅长查找定界符,但是当必须将模式应用于定界符之间的内容时, split不太好,在这种情况下会很合适。

Instead, I'd use a regex to look for the text that occurs between delimiters, and use the Matcher methods. 相反,我将使用正则表达式来查找定界符之间出现的文本,并使用Matcher方法。 The way I look at problems like this is to think of the non-operator text as a sequence of entities, where each entity is 我这样看待问题的方式是将非操作员文本视为一系列实体,其中每个实体都是

  • a quoted string; 带引号的字符串;
  • a single character that is not a quote, and is not an operator (or the start of an operator, if some operators are two or more characters). 不是引号,也不是运算符的单个字符(如果某些运算符是两个或多个字符,则不是运算符的开始)。

If all your operators are one character, a regex that finds an "operand" might look like 如果您所有的运算符都是一个字符,则找到“操作数”的正则表达式可能看起来像


which says to look for any number of characters between quote marks, or for any single character that is not + , - , * , or / (note that the - has to be escaped inside the character class). 这表示要在引号之间查找任意数量的字符, 或者查找不是+-*/任何单个字符(请注意-必须在字符类内转义)。 The last * means to look for zero or more of this pattern. 最后一个*表示寻找零个或多个该模式。

To look for a case where an operator might be multiple characters, such as << or >> , you can use negative lookahead: 要查找一个运算符可能是多个字符(例如<<>> ,可以使用负向超前:


which means to find either a quoted string, or a non-quote character at a point where we're not looking at + , - , * , / , << , or >> , and find this zero or more times. 这意味着在我们不查看+-*/<<>>的点上找到带引号的字符串或非引号字符,并找到零次或多次。

The plan would be to use lookingAt() with a matcher to find the operand, then use lookingAt() to find the operator, and go back and forth. 计划是将lookingAt()与匹配器一起使用来查找操作数,然后使用lookingAt()来查找运算符,然后来回移动。 (Or if you don't need to keep the operators at all, use find() as in @hwnd's answer.) (或者,如果您根本不需要保留运算符,请使用@hwnd的答案中的find() 。)

NOTE: I have not tested this. 注意:我尚未测试。 I may have some details wrong, but this should give you an idea of the best approach. 我可能有一些细节错误,但这应该使您对最佳方法有所了解。

