简体   繁体   English

如果目标单词不是另一个单词的一部分,则使用正则表达式替换单词

[英]Word replacement using regex if the target word is not a part of another word

I am working on regex expression for a word to be replaced if it is a standalone word and not a part of another word. 我正在研究正则表达式,如果一个单词是独立单词而不是另一个单词的一部分,则要替换该单词。 For example the word "thing". 例如,单词“事物”。 If it is something , the substring "thing" should be ignored there, but if a word "thing" is preceded with a special character such as a dot or a bracket, I want it captured. 如果是something ,则应该在此忽略子字符串“ thing”,但是如果单词“ thing”前面带有特殊字符(例如点或方括号),则希望将其捕获。 Also I want the word captured if there is a bracket, dot, or comma (or any other non-alphanumeric character is there) after it. 另外,如果要在其后面有方括号,点或逗号(或其他任何非字母数字字符),我也希望捕获该单词。

In the string 在字符串中

Something is a thing , and one more thingy and ( thing and more thing 东西是thing ,还有一啄和( thingthing

In the sentence above I highlighted the 3 words to be marked for replacement. 在上面的句子中,我突出显示了要标记为替换的3个单词。 I used the following Regex 我使用了以下正则表达式

\bthing\b

I tried out the above sentence on regex101.com and using this regex expression only the first word gotten highlighted. 我在regex101.com上尝试了上述句子,并使用此regex表达式仅突出显示了第一个单词。 I understand that my regex would not capture (thing but I thought it would capture the last word in the sentence so that there would be at least 2 occurrences. 我知道我的正则表达式不会捕获(thing但是我认为它将捕获句子中的最后一个单词,这样至少会出现两次。

Can someone please help me modify my regex expression to capture all 3 occurences in the sentence above? 有人可以帮我修改我的regex表达式以捕获上面句子中的所有3种情况吗?

You were likely using the javascript regex, which returns after the first match is found. 您可能使用了javascript正则表达式,该表达式会在找到第一个匹配项后返回。 If you add the modifier g in the second box on regex101.com, it will find all matches. 如果在regex101.com的第二个框中添加修饰符g ,它将找到所有匹配项。

This site is better for C# regex testing: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx 该站点更适合C#正则表达式测试: http : //derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

If you code this in C# and use the 'Matches' method, it should match multiple times. 如果使用C#编写此代码并使用“匹配”方法,则它应该多次匹配。

Regex regex = new Regex("\\bthing\\b");

foreach (Match match in regex.Matches(
    "Something is a thing, and one more thingy and (thing and more thing"))
{
    Console.WriteLine(match.Value);
}

Shorthand for alphanum [0-9A-Za-z] is [^\\W_] 字母数字[0-9A-Za-z]简写为[^\\W_]

Using a lookbehind and lookahead you'd get 使用先行查找和先行查找

(?<![^\\W_])thing(?![^\\W_])

Expanded 展开式

 (?<! [^\W_] )       # Not alphanum behind
 thing               # 'thing'
 (?! [^\W_] )        # Not alphanum ahedad

Matches the highlighted text 匹配突出显示的文本

Something is a thing , and one more thingy and ( thing and more thing 东西是thing ,还有一啄和( thingthing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM