简体   繁体   中英

How to match something with regex that is not between two special characters?

I have a string like this:

abcab " ab " ba " a "

How do I match every a that is not part of a string delimited by " ? I want to match everything that is bold here:

a bc a b " ab " b a " a "

I want to replace those matches (or rather remove them by replacing them with an empty string), so removing the quoted parts for matching won't work, because I want those to remain in the string. I'm using Ruby.

Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:

result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')

This replaces all the a s with the empty string if and only if there is an even number of quotes ahead of the matched a .

Explanation:

a        # Match a
(?=      # only if it's followed by...
 (?:     # ...the following:
  [^"]*" #  any number of non-quotes, followed by one quote
  [^"]*" #  the same again, ensuring an even number
 )*      # any number of times (0, 2, 4 etc. quotes)
 [^"]*   # followed by only non-quotes until
 \Z      # the end of the string.
)        # End of lookahead assertion

If you can have escaped quotes within quotes ( a "length: 2\\"" ), it's still possible but will be more complicated:

result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')

This is in essence the same regex as above, only substituting (?:\\\\.|[^"\\\\]) for [^"] :

(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^"\\] # any character except backslash or quote
)       # End of alternation

js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest .)

As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a

subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced

See this live demo

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

Full-blown regex solution for regex lover, without caring about performance or code-readability.

This solution assumes that there is no escaping syntax (with escaping syntax, the a in "sbd\\"a" is counted as inside the string).

Pseudocode:

processedString = 
    inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
               .replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote

Then you can match the text you want in the processedString . You can remove the 2nd replace if you consider text after the lone quote as outside quote.

EDIT

In Ruby, the regex in the code above would be

/\".*?\"/

used with gsub

and

/\".*/

used with sub


To address the replacement problem, I'm not sure whether this is possible, but it worths trying:

  • Declare a counter
  • Use the regex /(\\"|a)/ with gsub, and supply function.
  • In the function, if match is " , then increment counter, and return " as replacement (basically, no change). If match is a check whether the counter is even: if even supply your replacement string; otherwise, just supply whatever is matched.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM