简体   繁体   English

用Java分割字符串,保留定界符,包括引号内的项目

[英]Split string in Java, retain delimiters including items inside quotes

I have a .txt input file as follows: 我有一个.txt输入文件,如下所示:

Start "String" (100, 100) Test One:
  Nextline 10;
  Test Second Third(2, 4, 2, 4):
    String "7";
    String "8";
    Test "";
  End;
End.

I've intended to read this file in as one String and then split it based on certain delimiters. 我打算将此文件读为一个字符串,然后根据某些定界符将其拆分。 I've almost met the desired output with this code: 使用此代码,我几乎达到了所需的输出:

String tr=  entireFile.replaceAll("\\s+", "");

String[] input = tr.split("(?<=[(,):;.])|(?=[(,):;.])|(?=\\p{Upper})");

My current output is: 我当前的输出是:

Start"
String"
(
100
,
100
)
Test
One
:
Nextline10
;
Test
Second
Third
(
2
,
4
,
2
,
4
)
:
String"7"
;
String"8"
;
Test""
;
End
;
End
.

However, I'm having trouble treating items inside quotes or just plain quotes "" as a separate token. 但是,我在将引号内或仅将单引号“”中的项目作为单独的标记时遇到麻烦。 So "String" and "7" and "" should all be on separate lines. 因此,“ String”和“ 7”和“”应该都放在单独的行上。 Is there a way to do this with regex? 有没有办法用正则表达式做到这一点? My expected output is below, thanks for any help. 我的预期输出如下,感谢您的帮助。

Start
"String"
(
100
,
100
)
Test
One
:
Nextline
10
;
Test
Second
Third
(
2
,
4
,
2
,
4
)
:
String
"7"
;
String
"8"
;
Test
""
;
End
;
End
.

Here's the regex I came up with: 这是我想出的正则表达式:

String[] input = entireFile.split(
        "\\s+|" +           // Splits on whitespace or 
        "(?<=\\()|" +         // splits on the positive lookbehind ( or
        "(?=[,).:;])|" +  // splits on any of the positive lookaheads ,).:; or
        "((?<!\\s)(?=\\())"); // splits on the positive lookahead ( with a negative lookbehind whitespace

To understand all that positive/negative lookahead/lookbehind terminology, take a look at this answer . 要了解所有积极/消极的先行/后退术语,请看一下此答案

Note that you should apply this split directly to the input file without removing whitespace, aka take out this line: 请注意,您应该将此拆分直接应用于输入文件,而不要删除空格,也就是删除以下行:

String tr=  entireFile.replaceAll("\\s+", "");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM