简体   繁体   中英

How to capture a group only if occurs twice in a line

import re

text = """
Tumble Trouble Twwixt Two Towns!
Was the Moon soon in the Sea
Or soon in the sky?
Nobody really knows YET.
"""

在此处输入图像描述

How should I make the match happen only when the occurence is found twice in a line?

Regular expression that highlights two 'o's that appear beside each other only if there is another occurence of two 'o's appearing beside each other subsequently in the same line

You can match a single word char with a backreference, and group that again.

The word character will become group 2 as the groups are nested, then the outer group will be group 1.

Then you can assert group 1 using a positive lookahead again in the line.

((\w+)\2)(?=.*?\1)

The pattern matches:

  • ( Capture group 1
    • (\w+)\2 Match 1+ word chars in capture group 2 followed by a backreference to group 2 to match the same again
  • ) Close group 1
  • (?=.*?\1) Positive lookahead to assert the captured value of group 1 in the line

See a regex demo and a Python demo .

Example

print(re.compile(r"((\w+)\2)(?=.*?\1)").sub('{\g<1>}', text.rstrip()))

Output

Tumble Trouble Twwixt Two Towns!
Was the M{oo}n soon in the Sea
Or soon in the sky?
Nobody really knows YET.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM