简体   繁体   English

需要Regex帮助,以排除“

[英]Need help in Regex to exclude splitting string within "

I need to split a String based on comma as seperator, but if the part of string is enclosed with " the splitting has to stop for that portion from starting of " to ending of it even it contains commas in between. 我需要基于逗号分隔字符串作为分隔符,但是如果字符串的部分用“”括起来,则该部分必须停止从“”的开头到结尾的结尾,即使它之间包含逗号也是如此。

Can anyone please help me to solve this using regex with look around. 任何人都可以帮我解决使用正则表达式环顾四周。

Resurrecting this question because it had a simple regex solution that wasn't mentioned. 重提此问题,因为它有一个未提及的简单正则表达式解决方案。 This situation sounds very similar to ["regex-match a pattern unless..."][4] 这种情况听起来非常类似于[“正则表达式匹配模式,除非...”] [4]

\"[^\"]*\"|(,)

The left side of the alternation matches complete double-quoted strings. 交替的左侧与完整的双引号字符串匹配。 We will ignore these matches. 我们将忽略这些匹配。 The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left. 右侧匹配并捕获到第1组的逗号,我们知道它们是正确的,因为左侧的表达式没有匹配它们。

Here is working code (see online demo ): 这是工作代码(请参阅在线演示 ):

import java.util.regex.*;
import java.util.List;

class Program {
    public static void main (String[] args) {

        String subject = "\"Messages,Hello\",World,Hobbies,Java\",Programming\"";
        Pattern regex = Pattern.compile("\"[^\"]*\"|(,)");
        Matcher m = regex.matcher(subject);
        StringBuffer b = new StringBuffer();
        while (m.find()) {
            if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
            else m.appendReplacement(b, m.group(0));
        }
        m.appendTail(b);
        String replaced = b.toString();
        String[] splits = replaced.split("SplitHere");
        for (String split : splits)
            System.out.println(split);
    } // end main
} // end Program

Reference 参考

  1. How to match pattern except in situations s1, s2, s3 除情况s1,s2,s3之外如何匹配模式

Please try this: 请尝试以下方法:


(?<!\\G\\s*"[^"]*),


If you put this regex in your program, it should be: 如果将此正则表达式放在程序中,则应为:

String regex = "(?<!\\\\G\\\\s*\\"[^\\"]*),";


But 2 things are not clear: 但是有两点不清楚:

  1. Does the " only start near the , , or it can start in the middle of content, such as AAA, BB"CC,DD" ? The regex above only deal with start neer , . "仅在,附近开始,还是可以在内容中间开始,例如AAA, BB"CC,DD" ?上面的正则表达式仅处理start neer ,

  2. If the content has " itself, how to escape? use "" or \\" ? 如果内容具有"本身,如何转义?请使用""或“ \\" The regex above does not deal any escaped " format. 上面的正则表达式不处理任何转义的"格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM