简体   繁体   中英

Regex: Replace foo if it's a word or inside a URL

Given this:

str = "foo myfoo http://thefoobar.com/food is awesome";
str.replace(magicalRegex, 'bar');

The expected result is:

"bar myfoo http://thebarbar.com/bard is awesome"

I get the \\b(foo)\\b part, but I can't figure out how to match and capture foo from within a url. For these purposes, assume urls always start with http .

Any help?

You can use this code (works well with your example but haven't tried with more complex inputs):

str = 'foo myfoo http://thefoobar.com/food is awesome';
str = str.replace(/\bfoo\b/g, 'bar');
while (/http:\/\/[^\s]*?foo/.test(str))
    str = str.replace(/(http:\/\/[^\s]*?)?foo/g, function($0, $1) {
        return $1 ? $1 + 'bar' : $0;
    });
console.log(str);

OUTPUT:

bar myfoo http://thebarbar.com/bard is awesome

Live Demo: http://ideone.com/8xGy2h

I think you are going to have to do go multi-step to get this done right. Basically you are doing two separate (albeit, similar) regex replacements here:

  1. a global replacement of the character group "foo", if it occurs within a link, and
  2. a global replacement of the word "foo" in the rest of the string.

This code would run through both steps separately (URL first, rest of the string second) and give the final replacement:

var urlPattern = /(http:\/\/[^\s]+)/;
var urlFooPattern = /(foo)/g;
var globalFooPattern = /\b(foo)\b/g;

var str = "foo myfoo http://thefoobar.com/food is awesome";

var urlString = str.match(urlPattern)[0];
urlString = urlString.replace(urlFooPattern, "bar");

str = str.replace(urlPattern, urlString);

str = str.replace(globalFooPattern, "bar");             

Note: this assumes that there is only one URL in the string . . . to handle the possibility of multiple URLs would be a good bit more complicated:

  1. capture all of the URLs using var urlString = str.match(urlPattern) in an array
  2. creating a new array by looping through each URL and doing an individually "foo replace" on each
  3. Looping through the original array of matches and using those as the patterns to be replaced by the updated values in the second array

looping through all of the URLs returned by var urlString = str.match(urlPattern) , replacing "foo" in them individually, and looping through again then replacing them in the original string one at a time.

If you want to use boundaries to only match "foo" but not "myfoo", you'll need to use an or operation ( | )to match the urls--by necessity, if "foo" is included in the middle of a url, it will not be surrounded by word boundaries.

Something like this should work for you:

\b(foo)\b | http\S*(foo)\S*

You can run further tests here if needed.


EDIT: Apologies, I thought the OP was looking to capture those words and URLs. Look-behind regexes that won't capture the root of the URL for replacement aren't innately supported in JS as far as I know, but can frequently be duplicated with a simple function, see here for a discussion of how to do so .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM