简体   繁体   English

正则表达式:如果foo是单词或URL内,则替换foo

[英]Regex: Replace foo if it's a word or inside a URL

Given this: 鉴于这种:

str = "foo myfoo http://thefoobar.com/food is awesome";
str.replace(magicalRegex, 'bar');

The expected result is: 预期的结果是:

"bar myfoo http://thebarbar.com/bard is awesome"

I get the \\b(foo)\\b part, but I can't figure out how to match and capture foo from within a url. 我得到\\b(foo)\\b部分,但我无法弄清楚如何在网址中匹配和捕获foo For these purposes, assume urls always start with http . 出于这些目的,假设网址始终以http开头。

Any help? 有帮助吗?

You can use this code (works well with your example but haven't tried with more complex inputs): 您可以使用此代码(适用于您的示例但未尝试使用更复杂的输入):

str = 'foo myfoo http://thefoobar.com/food is awesome';
str = str.replace(/\bfoo\b/g, 'bar');
while (/http:\/\/[^\s]*?foo/.test(str))
    str = str.replace(/(http:\/\/[^\s]*?)?foo/g, function($0, $1) {
        return $1 ? $1 + 'bar' : $0;
    });
console.log(str);

OUTPUT: OUTPUT:

bar myfoo http://thebarbar.com/bard is awesome

Live Demo: http://ideone.com/8xGy2h 现场演示: http//ideone.com/8xGy2h

I think you are going to have to do go multi-step to get this done right. 我认为你将不得不多做一步才能做到这一点。 Basically you are doing two separate (albeit, similar) regex replacements here: 基本上你在这里做两个独立的(尽管是类似的)正则表达式替换:

  1. a global replacement of the character group "foo", if it occurs within a link, and 字符组“foo”的全局替换,如果它出现在链接中,并且
  2. a global replacement of the word "foo" in the rest of the string. 全局替换字符串其余部分中的“foo” 一词

This code would run through both steps separately (URL first, rest of the string second) and give the final replacement: 此代码将分别执行两个步骤(URL首先,其余字符串秒)并给出最终替换:

var urlPattern = /(http:\/\/[^\s]+)/;
var urlFooPattern = /(foo)/g;
var globalFooPattern = /\b(foo)\b/g;

var str = "foo myfoo http://thefoobar.com/food is awesome";

var urlString = str.match(urlPattern)[0];
urlString = urlString.replace(urlFooPattern, "bar");

str = str.replace(urlPattern, urlString);

str = str.replace(globalFooPattern, "bar");             

Note: this assumes that there is only one URL in the string . 注意:这假定字符串中只有一个URL。 . . to handle the possibility of multiple URLs would be a good bit more complicated: 处理多个URL的可能性会更复杂一些:

  1. capture all of the URLs using var urlString = str.match(urlPattern) in an array 使用数组中的var urlString = str.match(urlPattern)捕获所有URL
  2. creating a new array by looping through each URL and doing an individually "foo replace" on each 通过循环遍历每个URL并在每个URL上单独执行“foo replace”来创建新数组
  3. Looping through the original array of matches and using those as the patterns to be replaced by the updated values in the second array 循环遍历原始匹配数组,并将这些匹配作为模式替换为第二个数组中的更新值

looping through all of the URLs returned by var urlString = str.match(urlPattern) , replacing "foo" in them individually, and looping through again then replacing them in the original string one at a time. 循环遍历var urlString = str.match(urlPattern)返回的所有URL,单独替换它们中的“foo”,然后再次循环,然后一次一个地替换原始字符串中的它们。

If you want to use boundaries to only match "foo" but not "myfoo", you'll need to use an or operation ( | )to match the urls--by necessity, if "foo" is included in the middle of a url, it will not be surrounded by word boundaries. 如果你想使用边界只匹配“foo”但不匹配“myfoo”,你需要使用或操作( | )匹配网址 - 必要时,如果“foo”包含在中间url,它不会被字边界包围。

Something like this should work for you: 这样的事情对你有用:

\b(foo)\b | http\S*(foo)\S*

You can run further tests here if needed. 如果需要,您可以在此处运行进一步测试


EDIT: Apologies, I thought the OP was looking to capture those words and URLs. 编辑:道歉,我认为OP正在寻找捕获这些词和URL。 Look-behind regexes that won't capture the root of the URL for replacement aren't innately supported in JS as far as I know, but can frequently be duplicated with a simple function, see here for a discussion of how to do so . 据我所知,JS中并不支持不会捕获URL替换URL的后备正则表达式,但可以通过简单的函数进行复制, 请参阅此处以了解如何执行此操作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM