简体   繁体   English

正则表达式排除被特殊字符包围的单词

[英]Regex expression to exclude words surrounded by special characters

I've been having issues with finding a solution to a regex conundrum I'm having.我一直在为我遇到的正则表达式难题寻找解决方案。

Recently, I worked on a project where we needed to replace a list of words in a given text with a list of anchor tags.最近,我参与了一个项目,我们需要将给定文本中的单词列表替换为锚标签列表。

For example, given a string例如,给定一个字符串

This is a test string

I may want to replace the word “test” with我可能想将“测试”一词替换为

<a target="_blank"  href="https://website.com/string-random“>test</a>. 

The resulting string should look like this结果字符串应如下所示

This is a <a target="_blank" href="https://website.com/string-random“>test</a> string

The replacement of the words is done in a loop单词的替换是在循环中完成的

foreach ($documents as $document)
 
    foreach ($links as $link)
        replace keywords

What ends up happening in some scenarios is some of the urls in the anchor tags contain words that could potentially be replaced在某些情况下最终会发生的是锚标签中的一些 url 包含可能被替换的词

For example, given this list of words to replace例如,给定这个要替换的单词列表

[
    {
        'keyword': 'test',
        'link': 'https://website.com/string-random'
    },
    {
        'keyword': 'string',
        'link': 'https://random.com/string'
    }
]

After all the replacements are done, the sample string I gave above would look like this完成所有替换后,我上面给出的示例字符串将如下所示

This is a <a target="_blank" href="https://website.com/<a target="_blank"  href="https://random.com/string“>string</a>-random“>test</a> <a target="_blank" href="https://random.com/string“>string</a>

Instead of代替

This is a <a target="_blank" href="https://website.com/string-random“>test</a> <a target="_blank" href="https://random.com/string“>string</a>

Currently, I am looking for a regular expression that would not match on any words surrounded by special characters as I think this would solve my problem.目前,我正在寻找一个与任何被特殊字符包围的单词都不匹配的正则表达式,因为我认为这可以解决我的问题。

Also very open to any other ideas on how to tackle this problem对如何解决这个问题的任何其他想法也非常开放

This is not just about the previous replacements: any word that occurs within tag attributes / names / values is an issue.这不仅仅是关于以前的替换:标签属性/名称/值中出现的任何单词都是一个问题。

In other words, you want to replace strings that are followed some chars where next < occurs before next > (strings between tags and not within tags)换句话说,您想替换某些字符后面的字符串,其中 next < 出现在 next > 之前(标签之间的字符串而不是标签内的字符串)

Hence try this one : (string-to-match)(?=[^>]*?<)因此试试这个 : (string-to-match)(?=[^>]*?<)

(replace string-to-match, obviously) (显然,替换字符串匹配)

The other block is a lookahead : it ensures that you can read any char but >, as many times as needed, then a <另一个块是前瞻:它确保您可以读取任何字符但 >,根据需要多次,然后是 <

Try :尝试 :

foreach ($wordlist as $word){
     $document = preg_replace("~(?! )($word[keyword])(?! )~i","<a href='$word[link]'>$1<")
}

I found a pattern that works pretty well for me hear $pattern = '/(?<!(>|\\/|-))\\b' . preg_quote($stringToReplace, '/') . '\\b(?!(<|\\/|-))/i';我发现了一个非常适合我听到$pattern = '/(?<!(>|\\/|-))\\b' . preg_quote($stringToReplace, '/') . '\\b(?!(<|\\/|-))/i'; $pattern = '/(?<!(>|\\/|-))\\b' . preg_quote($stringToReplace, '/') . '\\b(?!(<|\\/|-))/i';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM