[英]Regex to tokenize string in java with space and double quotes
I'm trying to create a regex to tokenize a string. 我正在尝试创建一个正则表达式来标记字符串。 An example string would be.
一个示例字符串。
"hello world" Alexandros Alex "I Am" Something
I need to get responce back: 我需要回复:
hello world
Alexandros
Alex
I am
Something
So to make it clear, tokenize with space but not words within quotes. 所以说清楚,用空格标记,但不用引号内的单词。 If this is an easy regural expresion sorry in advance but i always strugle with these.
如果这是一个容易的regural expresion抱歉提前,但我总是与这些争论。
You could try: \\b(?:(?<=")[^"]*(?=")|\\w+)\\b
. This will exclude the actual quotes from the matches. 您可以尝试:
\\b(?:(?<=")[^"]*(?=")|\\w+)\\b
。这将排除匹配项中的实际引号。
import java.util.regex.*;
public class Test {
public static void main(String...args) {
String line = "\"hello world\" Alexandros Alex \"I Am\" Something";
Pattern pattern = Pattern.compile("\\b(?:(?<=\")[^\"]*(?=\")|\\w+)\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
When executed, you get this output: 执行时,您将获得此输出:
$ javac Test.java
$ java Test
hello world
Alexandros
Alex
I Am
Something
If you want to split ,you can do so by checking if "
are balanced.. 如果你想拆分 ,你可以通过检查
"
是否平衡..
Now obviously if the space is between ""
the number of "
would not be even..This is what the below regex
do 现在很明显,如果空间在
""
之间,那么"
不会是偶数......这就是下面的regex
所做的
\s(?=(?:([^"]*"[^"]*"[^"]*)*|[^"]*)$)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.