[英]How to Split a String on mathematical operators as delimiters but escape operators inside quotes (in java)?
For example, 例如,
AM2 + 'G - D08 - 28 - 14 .xlsx]General Inputs'
should be split as AM2 + 'G - D08 - 28 - 14 .xlsx]General Inputs'
应拆分为
AM2
and 'G - D08 - 28 - 14 .xlsx]General Inputs'
. AM2
和'G - D08 - 28 - 14 .xlsx]General Inputs'
。
For your type of given input example, I would probably match vs splitting. 对于给定输入示例的类型,我可能会匹配vs拆分。
String s = "AM2 + 'G - D08 - 28 - 14 .xlsx]General Inputs'";
Pattern p = Pattern.compile("'[^']*'|[^ '+*/-]+");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output 输出量
AM2
'G - D08 - 28 - 14 .xlsx]General Inputs'
I don't think you can do this with split
--if you can, it would be very tricky and messy. 我认为您无法使用
split
进行此操作-如果可以的话,这将非常棘手且混乱。 split
is good at looking for delimiters, but not so good when a pattern has to be applied to the stuff in between delimiters, which it would in this case. split
擅长查找定界符,但是当必须将模式应用于定界符之间的内容时, split
不太好,在这种情况下会很合适。
Instead, I'd use a regex to look for the text that occurs between delimiters, and use the Matcher
methods. 相反,我将使用正则表达式来查找定界符之间出现的文本,并使用
Matcher
方法。 The way I look at problems like this is to think of the non-operator text as a sequence of entities, where each entity is 我这样看待问题的方式是将非操作员文本视为一系列实体,其中每个实体都是
If all your operators are one character, a regex that finds an "operand" might look like 如果您所有的运算符都是一个字符,则找到“操作数”的正则表达式可能看起来像
('.*?'|[^'+\-*/])*
which says to look for any number of characters between quote marks, or for any single character that is not +
, -
, *
, or /
(note that the -
has to be escaped inside the character class). 这表示要在引号之间查找任意数量的字符, 或者查找不是
+
, -
, *
或/
任何单个字符(请注意-
必须在字符类内转义)。 The last *
means to look for zero or more of this pattern. 最后一个
*
表示寻找零个或多个该模式。
To look for a case where an operator might be multiple characters, such as <<
or >>
, you can use negative lookahead: 要查找一个运算符可能是多个字符(例如
<<
或>>
,可以使用负向超前:
('.*?'|(?!\+|-|\*|/|<<|>>)[^'])*
which means to find either a quoted string, or a non-quote character at a point where we're not looking at +
, -
, *
, /
, <<
, or >>
, and find this zero or more times. 这意味着在我们不查看
+
, -
, *
, /
, <<
或>>
的点上找到带引号的字符串或非引号字符,并找到零次或多次。
The plan would be to use lookingAt()
with a matcher to find the operand, then use lookingAt()
to find the operator, and go back and forth. 计划是将
lookingAt()
与匹配器一起使用来查找操作数,然后使用lookingAt()
来查找运算符,然后来回移动。 (Or if you don't need to keep the operators at all, use find()
as in @hwnd's answer.) (或者,如果您根本不需要保留运算符,请使用@hwnd的答案中的
find()
。)
NOTE: I have not tested this.
注意:我尚未测试。 I may have some details wrong, but this should give you an idea of the best approach.
我可能有一些细节错误,但这应该使您对最佳方法有所了解。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.