简体   繁体   English

解析包含双引号的字符串

[英]Parsing a string that contains double quotes

i have a relatively simple java question.我有一个相对简单的 java 问题。 I have a string that looks like this:我有一个看起来像这样的字符串:

"Anderson,T",CWS,SS

I need to parse it in a way that I have我需要以我拥有的方式解析它

Anderson,T    
CWS    
SS

all as separate strings.全部作为单独的字符串。

Thanks!谢谢!

Here's a solution that will capture quoted strings, remove spaces, and match empty items:这是一个解决方案,它将捕获带引号的字符串、删除空格并匹配空项:

public static void main(String[] args) {
    String quoted = "\"(.*?(?<!\\\\)(?:\\\\\\\\)*)\"";
    Pattern regex = Pattern.compile(
        "(?:^|(?<=,))\\s*(" + quoted + "|[^,]*?)\\s*(?:$|,)");

    String line = "\"Anderson,T\",CWS,\"single quote\\\"\", SS ,,hello,,";
    Matcher m = regex.matcher(line);
    int count = 0;
    while (m.find()) {
        String s = m.group(2) == null ? m.group(1) : m.group(2);
        System.out.println(s);
        count++;
    }
    System.out.printf("(%d matches found)%n", count);
}

I split out the quoted part of the pattern to make it a bit easier to follow.我拆分了模式的引用部分,以便更容易理解。 Capturing group 1 is the quoted string, 2 is every other match.捕获组 1 是引用的字符串,2 是每隔一个匹配项。

To break down the overall pattern:要分解整体模式:

  1. Look for start of line or previous comma (?:^|(?<=,)) (don't capture)查找行首或前一个逗号(?:^|(?<=,)) (不捕获)
  2. Ignore 0+ spaces \\s*忽略 0+ 个空格\\s*
  3. Look for quoted string or string without comma (" + quoted + "|[^,]*?) (The non-comma match is non-greedy so it doesn't grab any following spaces)查找带引号的字符串或不带逗号的字符串(" + quoted + "|[^,]*?) (非逗号匹配是非贪婪的,因此它不会抓取任何后续空格)
  4. Ignore 0+ spaces again \\s*再次忽略 0+ 个空格\\s*
  5. Look for end of line, or comma (?:$|,) (don't capture)查找行尾或逗号(?:$|,) (不要捕获)

To break down the quote pattern:要分解报价模式:

  1. Look for opening quote \"寻找开盘报价\"
  2. Start group capture (开始组捕获(
  3. Get the minimum match of any character .*?获取任何字符的最小匹配.*?
  4. Match 0+ even number of backslashes (?<?\\\\)(::\\\\\\\\)* (to avoid matching escaped quotes with or without preceding escaped backslashes)匹配 0+ 偶数个反斜杠(?<?\\\\)(::\\\\\\\\)* (以避免匹配带有或不带有前面转义反斜杠的转义引号)
  5. Close capturing group )关闭捕获组)
  6. Match closing quote \"匹配结束引号\"

Assuming your string looks like this假设你的字符串看起来像这样

String input = "\"Anderson,T\",CWS,SS";

You can use this solution found for a similar scenario.您可以将这个解决方案用于类似的场景。

String input = "\"Anderson,T\",CWS,SS";
List<String> result = new ArrayList<String>();
int start = 0; //start index. Used to determine where the word starts
boolean inQuotes = false;

for (int current = 0; current < input.length(); current++) { //iterate through characters
    if (input.charAt(current) == '\"') //if found a quote
        inQuotes = !inQuotes; // toggle state
    if(current == (input.length() - 1))//if it is the last character
        result.add(input.substring(start)); //add last word
    else if (input.charAt(current) == ',' && !inQuotes) { //if found a comma not inside quotes
        result.add(input.substring(start, current)); //add everything between the start index and the current character. (add a word)
        start = current + 1; //update start index
    }
}
System.out.println(result);

I have modified it a bit to improve readability.我对其进行了一些修改以提高可读性。 This code stores the strings you want in the list result .此代码将您想要的字符串存储在列表result中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM