简体   繁体   English

在具有特殊字符和复杂性的空格上拆分Java字符串

[英]Split Java string on spaces with special characters and complications

I have an input string like : 我有一个输入字符串,如:

-a  var1=Bat"m/an  -b   var2=" -a="lol "  -c  var3=" M^a%g-i=c "

After splitting I should get : 拆分后,我应该得到:

Output 输出量

- -a
- var1=Bat"m/an
- -b
- var2=" -a="lol "
- -c
- var3=" M^a%g-i=c "

Rules: 规则:

  • format is something like -(char)(atleast one space)variable=value 格式类似于-(char)(atleast one space)variable=value
  • value can have any special characters except spaces ex. 值可以有任何特殊字符,但空格ex。除外。 Bat"m/an
  • value can have spaces if in quotes ex. 如果用引号引起来,则值可以有空格。 " -a="lol " or " M^a%gi=c " " -a="lol "" M^a%gi=c "

I have written the regex but quotes inside quotes is messing it up : 我已经写了正则表达式,但是引号内的引号弄乱了它:

(?:"[^"]*"|\S)+

Also I tried to parse character wise or split on =" but I'm facing ambiguity as they can be inside quotes too. 我也尝试解析明智的字符或在="上分割,但我也面临歧义,因为它们也可以在引号内。

You may use this regex for matching with a lookahead assertion: 您可以使用此正则表达式与前瞻性断言进行匹配

-?[a-z_]\w*(?:=".*?"(?=\h+(?:-[a-z](?=\h|$)|[a-z]\w*=)|$)|\S+)?

RegEx Demo 正则演示

RegEx Explanation: RegEx说明:

  • -? : Start with an optional hyphen :以可选的连字符开头
  • [a-z_]\\w* : match a variable that starts with a lowercase letter or underscore followed by 0+ word characters [a-z_]\\w* :匹配以小写字母或下划线开头,后跟0+个单词字符的变量
  • (?: : Start non-capture group (?:启动非捕获组
    • ".*?"(?=...<expression>) : Match quoted string that starts and ends with double quote. ".*?"(?=...<expression>) :匹配以双引号开头和结尾的带引号的字符串。 Using lookahead we assert that we have another variable or end of line ahead. 使用前瞻,我们断言我们前面还有另一个变量或行尾。
    • | : OR : 要么
    • \\S+ : Match 1+ non-whitespace characters \\S+ :匹配1+个非空格字符
  • ) : End non-capture group ) :结束非捕获组

You can try something as below: 您可以尝试以下操作:

(-[a-z]|[^\s][^\s]*="?[^"]*"?[^\s]*)

where all the parameters and their values will be captured as a separate group 其中所有参数及其值将被捕获为单独的组

Explanation: 说明:

Capturing Group (-[a-z]|[^\s][^\s]*="?[^"]*"?[^\s]*)
1st Alternative -[a-z]
2nd Alternative [^\s][^\s]*="?[^"]*"?[^\s]*
[^\s] - A character which should not be a space
[^\s]* - Matches all non space characters
= checks for equal to as mandatory
= matches the character = literally (case sensitive)
"? checks if " symbol is there
[^"]* checks for all symbols that are not as "
"? Again check for " as option
[^\s]* Finally again check for all non space characters

Demo Here 在这里演示

Hope that helps :) . 希望能有所帮助:)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM