简体   繁体   中英

Regular Expression for Ruby gsub! with anchor that should not match

The following regular expression should work in Ruby, but it doesn't. Any ideas on how to fix it, so it can be used in a .gsub! statement in a loop?

textfield.gsub!( /(http:\/\/){0}www\./, 'http://www.' )

{0} should allow to match the first part zero times, but it does not:

'http://www.company1.com
 http://www.company2.com'.gsub!( /(http:\/\/){0}www\./, 'http://www.' )

=> " http://http://www.company1.com http://http://www.company2.com

in this example the regexp should not match, and leave the input string unmodified!

any ideas on how to make this work?

this looks like a bug in Ruby's regexp processing

I admit that I'm trying to generously interpret the semantics of {n} to include n = 0 :)

The trouble is that /(http:\\/\\/){0}/ matches the start of any string. In fact, /(x){0}/ will match the start of any string for any value of x . This regular expression says that we should find x zero times. Well, we can find x zero times between any two characters.

What you want is the start-of-string character, ^ , followed by a negative lookahead assertion, (?!...) . This allows you to match strings that do not begin with a particular sequence of characters.

'http://www.example.com'.gsub(/^(?!http:\/\/)www\./, 'http://www.')
# => 'http://www.example.com'

'www.example.com'.gsub(/^(?!http:\/\/)www\./, 'http://www.')
# => 'http://www.example.com'

Seems like you need to make the capturing group as optional.

> 'http://www.example.com'.gsub(/(http:\/\/)?www\./, 'http://www.')
=> "http://www.example.com"
> 'www.example.com'.gsub(/(http:\/\/)?www\./, 'http://www.')
=> "http://www.example.com"

(http:\\/\\/)? matches the string http:// zero or one times.

OR

You could use negative lookbehind also.

> 'www.example.com'.gsub(/(?<!http:\/\/)www\./, 'http://www.')
=> "http://www.example.com"

Here the substitution should happen because the string www. isn't preceded by http:// .

> 'http://www.example.com'.gsub(/(?<!http:\/\/)www\./, 'http://www.')
=> "http://www.example.com"

Here the substitution won't happen because the string www. is preceded by http:// . So the interpreter returns the original input string without any modifications.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM