简体   繁体   English

如何避免使用正则表达式将字符串括在双引号中?

[英]how to avoid strings enclosed in double quotes using regex?

I'm using regex in java to get all strings excluding the double quotes AND the strings inside the double quotes for this string: 我在Java中使用正则表达式来获取除双引号和该字符串的双引号内的所有字符串之外的所有字符串:

"Lorem ipsum mauris "libero" non "pulvinar" suscipit, nis "aenean" 
curae odio lobortis "nulla" suspendisse"

I can get the strings enclosed in the double quotes using: 我可以使用以下命令将字符串括在双引号中:

((\")(\S+)(\"))

Result: 结果:
[^((\")(\S+)(\"))]
which is the opposite of what I want 这与我想要的相反
but when I try to negate the pattern: 但是当我尝试否定模式时:
 [^((\\")(\\S+)(\\"))] 
the strings not enclosed in double quotes don't get targeted: 没有用双引号引起来的字符串没有针对性:

What I want is this: 我想要的是:

 "Lorem ","ipsum","mauris","non","suscipit",",","nis","curae", 
"odio","lobortis", "suspendiss"


Any help would be appreciated 任何帮助,将不胜感激

Character classes only negate individual characters and can't negate the pattern like you tried. 字符类只能否定单个字符,而不能像您尝试的那样否定模式。

You can use this regex which uses look arounds to reject strings that are surrounded by double quotes, 您可以使用此正则表达式使用环顾四周拒绝由双引号引起来的字符串,

(?!<")\b\w+\b(?!")

Here word boundary \\b ensures, that partial word should not be detected as a match. 此处,单词边界\\b确保不应将部分单词检测为匹配项。 For eg in word "libero" if we don't put \\b around the regex, then it may detect iber as a match from the middle of that word. 例如,在单词"libero"如果不将\\b放在正则表达式中,则它可能会从该单词的中间检测到iber作为匹配项。

Demo 演示

Java code for same would be, 同样的Java代码是

String s = "Lorem ipsum mauris \"libero\" non \"pulvinar\" suscipit, nis \"aenean\" curae odio lobortis \"nulla\" suspendisse";
Pattern p = Pattern.compile("(?!<\")\\b\\w+\\b(?!\")");
Matcher m = p.matcher(s);

while (m.find()) {
    System.out.println(m.group());
}

Which prints, 哪个打印,

Lorem
ipsum
mauris
non
suscipit
nis
curae
odio
lobortis
suspendisse

Edit: 编辑:

I realized you also want to get comma , as a matched string, so in that case you can change the regex a bit to say it like this, 我意识到你也希望得到逗号,作为匹配字符串,所以在这种情况下,你可以改变正则表达式有些说像这样,

(?!<")\b\w+\b(?!")|,

Although, now I am guessing that you string might have other special characters too other than comma, and in that case you can use a character class like this [,.!;] instead of just comma. 虽然,现在我猜您的字符串可能除了逗号以外还具有其他特殊字符,在这种情况下,您可以使用类似[,.!;]的字符类[,.!;]而不仅仅是逗号。 Also depending upon how you want to group those characters, whether continuous once together like ,;! 还取决于您如何对这些字符进行分组,是否像,;!一样连续一次,;! (then use [,.!;]+ ) or each of those special characters individually, then just keep the character class. (然后分别使用[,.!;]+ )或每个特殊字符,然后仅保留字符类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM