I am making a program in Java, and have run into a slight problem using Regular Expressions. I want to capture everything not enclosed in quotes. I have a regex pattern for that, right here<\/a> , but the issue is, it cannot be used in Java. It uses the (*SKIP)(*F)<\/code> trick to skip over the
".*"<\/code> , and find anything else(using
[^\\W]<\/code> ), but as I said, it cannot be used in Java.
It will find everything that does not have quotes directly in front or behind it. The issue with that one, is that If I have something like this: I have another pattern that is close, but not quite what I need,
Test1 "Hello World!" Test2<\/code>
Test1 "Hello World!" Test2<\/code> , and will grab
Test1<\/code> ,
Test2<\/code> , AND
World<\/code> .
I do not want to get
World<\/code> , because it is in the quotes.
What I want to know, is if it is even possible to do what I want, and how if so.
You must match the content you want to avoid and use a capture group to extract what you want (I don't think there is an other way)<\/em> . A convenient pattern to do that can be:
(?:[^\w"]+|"[^"]*")*+(\w+)
These verbs are a quite useful way to tell the regex engine (PCRE in this case) that you want to discard those matches.
".*"|([^\W]+)
or
".*"|(\w+)
Unfortunately I can't yet comment on other posts, but Federico Piazza's solution will fail if there are multiple sets of quotes. For example if your text was the following:
String text = "test1 \"hello world!\" test2 \"foobar\" test3";
You want words outside of quotes with excluding trailing spaces:
[^"\s]++((?=\s*"[^\s])|(?=\s*$)|(?=[^"]+\s+"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.