简体   繁体   中英

Regex replace identical word with “and”

In the following example sentence:

Green shirt green hat

Is it possible to use regex to detect 2 identical words and replace the second with and to become:

Green shirt and hat


A more difficult string example. Here the first of the identical words needs to be replaced:

You are an artistically gifted musically gifted individual

Should become:

You are an artistically and musically gifted individual

Description

First off, regex isn't the most ideal solution for this, but I'm sure you have your reasons for using it.

((\b[a-z]{1,}\b).*?)(\b\2\b)(.*)$

Replace with: \\1and\\4

正则表达式可视化

Summary

This regex will find two identical words in a string and replace the second one with and .

Example

Live Demo

https://regex101.com/r/yG3yM6/2

Sample text

Green shirt green hat
Green shirt greenish hat
You are an artistically gifted musically gifted individual

Sample Matches

Green shirt and hat
Green shirt greenish hat
You are an artistically gifted musically and individual

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      \b                       the boundary between a word char (\w)
                               and something that is not a word char
----------------------------------------------------------------------
      [a-z]{1,}                any character of: 'a' to 'z' (at least
                               1 times (matching the most amount
                               possible))
----------------------------------------------------------------------
      \b                       the boundary between a word char (\w)
                               and something that is not a word char
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
      \b                       the boundary between a word char (\w)
                               and something that is not a word char
----------------------------------------------------------------------
    \2                       what was matched by capture \2
----------------------------------------------------------------------
      \b                       the boundary between a word char (\w)
                               and something that is not a word char
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------

Extra credit

Although not addressed in the OP, if the words in question use non az characters, then you could replace [az] with [az]|[^\\x00-\\x7F] which will match non-english characters. But then we'll need to change the \\b\\2\\b to (?<=\\s|^)\\2(?=\\s|$) so we can ensure correct matching.

((\b(?:[a-z]|[^\x00-\x7F]){1,}\b).*?)((?<=\s|^)\2(?=\s|$))(.*)$

正则表达式可视化

Live Demo https://regex101.com/r/wD8yF5/2

By modifying this answer , you can do it:

 console.log( myFunc("Green shirt green hat") ); console.log( myFunc("Big red eyed rabbits red Ferrari") ); function myFunc(str) { return str.replace(/\\b(\\w+)(.+)(\\1)\\b/gi, "$1$2and"); } 

You can use RegExp /(\\bgreen\\b)/ig , where green is word to match, String.prototype.replace() , when p2 is reached within replacement function

p1 , p2 , ... The n th parenthesized submatch string, provided the first argument to replace() was a RegExp object. (Corresponds to $1 , $2 , etc. above.) For example, if /(\\a+)(\\b+)/ , was given, p1 is the match for \\a+ , and p2 for \\b+ .

replace green with and

 var str = "Green shirt green hat green"; var re = function(m, p1, p2, index) { return p2 ? "and" : m } str = str.replace(/(\\bgreen\\b)/ig, re); console.log(str); 

You can use the following:

/(\b([^\s]+)\b.*?)\b\2\b/gi

Test case:

var regex = /(\b([^\s]+)\b.*?)\b\2\b/gi;
'Green shirt green hat with blue shoes blue glasses'.replace(regex, '$1and')
  === 'Green shirt and hat with blue shoes and glasses';
'Orange colored oranges orange belts'.replace(regex, '$1and')
  === 'Orange colored oranges and belts';

Try it online

The answer to your first example - which I read as replace the second of the first repeated word with 'and' - is:

 var str = 'Green shirt green hat'; str = str.replace(/(\\b\\S+\\b)(.+?)(\\b\\1\\b)/i, '$1$2and'); console.log(str); 

The answer to your second example - which I read as replace the first repeated word with 'and' - is:

 var str = 'You are an artistically gifted musically gifted individual'; str = str.replace(/(\\b\\S+\\b)(.+?)(\\b\\1\\b)/i, 'and$2$1'); console.log(str); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM