简体   繁体   English

"有没有简单的 Java 正则表达式 (*SKIP)(*F) 替代方案?"

[英]Is there a simple Java Regex (*SKIP)(*F) alternative?

I am making a program in Java, and have run into a slight problem using Regular Expressions.我正在用 Java 编写程序,并且在使用正则表达式时遇到了一个小问题。 I want to capture everything not enclosed in quotes.我想捕捉所有没有用引号括起来的东西。 I have a regex pattern for that, right here<\/a> , but the issue is, it cannot be used in Java.我有一个正则表达式模式,就在这里<\/a>,但问题是,它不能在 Java 中使用。 It uses the (*SKIP)(*F)<\/code> trick to skip over the ".*"<\/code> , and find anything else(using [^\\W]<\/code> ), but as I said, it cannot be used in Java.它使用(*SKIP)(*F)<\/code>技巧跳过".*"<\/code>并找到其他任何东西(使用[^\\W]<\/code> ),但正如我所说,它不能在 Java 中使用。 I have another pattern that is close, but not quite what I need, right here<\/a> .我有另一个模式很接近,但不是我需要的,就在这里<\/a>。 It will find everything that does not have quotes directly in front or behind it.它会找到前面或后面没有引号的所有内容。 The issue with that one, is that If I have something like this: Test1 "Hello World!" Test2<\/code>那个问题是,如果我有这样的事情: Test1 "Hello World!" Test2<\/code> Test1 "Hello World!" Test2<\/code> , and will grab Test1<\/code> , Test2<\/code> , AND World<\/code> . Test1 "Hello World!" Test2<\/code> ,并将抓住Test1<\/code> , Test2<\/code> , AND World<\/code> 。 I do not want to get World<\/code> , because it is in the quotes.我不想得到World<\/code> ,因为它在引号中。 What I want to know, is if it is even possible to do what I want, and how if so.我想知道的是,是否有可能做我想做的事,如果可以的话怎么做。

"

You must match the content you want to avoid and use a capture group to extract what you want (I don't think there is an other way)<\/em> .您必须匹配您想要避免的内容并使用捕获组来提取您想要的内容(我认为没有其他方法)<\/em> 。 A convenient pattern to do that can be:一个方便的模式可以是:

(?:[^\w"]+|"[^"]*")*+(\w+)

These verbs are a quite useful way to tell the regex engine (PCRE in this case) that you want to discard those matches.这些动词是告诉正则表达式引擎(在本例中为 PCRE)您要丢弃这些匹配项的非常有用的方法。

Java doesn't have these verbs but you can use the same approach on java without the verbs (*SKIP)(*F)<\/code> , and then capture the content you want... so you can use: Java 没有这些动词,但您可以在没有动词(*SKIP)(*F)<\/code>的情况下在 java 上使用相同的方法,然后捕获您想要的内容......所以您可以使用:

".*"|([^\W]+)
or
".*"|(\w+)

Unfortunately I can't yet comment on other posts, but Federico Piazza's solution will fail if there are multiple sets of quotes.不幸的是,我还不能对其他帖子发表评论,但如果有多组引号,Federico Piazza 的解决方案将会失败。 For example if your text was the following:例如,如果您的文本如下:

String text = "test1 \"hello world!\" test2 \"foobar\" test3";

You want words outside of quotes with excluding trailing spaces:您希望引号之外的单词不包括尾随空格:

[^"\s]++((?=\s*"[^\s])|(?=\s*$)|(?=[^"]+\s+"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM