简体   繁体   中英

Regex to match single words/character sets that aren't in quotes

I'm looking to write a regex (C#) that will match words that aren't surrounded by quotes. An example input string would be:

dbo.test line_length "quoted words" notquoted

And this needs to match

dbo.test

line_length

nonquoted

So 3 separate matches and "quoted words" is not matched. The quoted phrase could be anywhere in the input...beginning, middle, end, etc.

I haven't been able to come up with a regex that matches words not in quotes where there could be a space in the quotes...I've been able to match something like: hello "world" and only get hello.

Is there a way to write the regex I'm trying to?

There are two ways to tackle this, depending on what you want to do with the output.

First, match (but don't capture) any text within quotation marks. (This is specifically matching the stuff that you DON'T want.) Using the | pipe, use capture groups to select everything that you DO want to keep.

Example:

".*?"|(\b\S+\b)

You can see an example of that here .

The other option, using look-arounds, is to specifically look backward from the beginning of the words to ensure that the " doesn't appear there:

(?<!")(\b\S+\b)(?!")

You can see that here .

This may have a problem when you start using multiple words, but this should get you on the right track, and you can indicate whether one of these methods works better for you than the other.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM