简体   繁体   English

正则表达式在java中用空格和双引号标记字符串

[英]Regex to tokenize string in java with space and double quotes

I'm trying to create a regex to tokenize a string. 我正在尝试创建一个正则表达式来标记字符串。 An example string would be. 一个示例字符串。

"hello world" Alexandros Alex "I Am" Something

I need to get responce back: 我需要回复:

hello world
Alexandros
Alex 
I am
Something

So to make it clear, tokenize with space but not words within quotes. 所以说清楚,用空格标记,但不用引号内的单词。 If this is an easy regural expresion sorry in advance but i always strugle with these. 如果这是一个容易的regural expresion抱歉提前,但我总是与这些争论。

You could try: \\b(?:(?<=")[^"]*(?=")|\\w+)\\b . This will exclude the actual quotes from the matches. 您可以尝试: \\b(?:(?<=")[^"]*(?=")|\\w+)\\b 。这将排除匹配项中的实际引号。

import java.util.regex.*;
public class Test {
    public static void main(String...args) {
        String line = "\"hello world\" Alexandros Alex \"I Am\" Something";
        Pattern pattern = Pattern.compile("\\b(?:(?<=\")[^\"]*(?=\")|\\w+)\\b");
        Matcher matcher = pattern.matcher(line);
        while (matcher.find()) {
            System.out.println(matcher.group(0));
        }
    }
}

When executed, you get this output: 执行时,您将获得此输出:

$ javac Test.java
$ java Test
hello world
Alexandros
Alex
I Am
Something

This regular expression will match either words or entire strings within quotes: "[^"]*"|\\w* 这个正则表达式将匹配引号中的单词或整个字符串: "[^"]*"|\\w*

You can create a matcher with this regex and just iterate through all the matches. 您可以使用此正则表达式创建匹配器,并迭代所有匹配项。 You can find some sample code here 你可以在这里找到一些示例代码

If you want to split ,you can do so by checking if " are balanced.. 如果你想拆分 ,你可以通过检查"是否平衡..

Now obviously if the space is between "" the number of " would not be even..This is what the below regex do 现在很明显,如果空间在""之间,那么"不会是偶数......这就是下面的regex所做的

\s(?=(?:([^"]*"[^"]*"[^"]*)*|[^"]*)$)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM