简体   繁体   中英

Using python regex with backreference matches

I have a doubt about regex with backreference.

I need to match strings, I try this regex (\\w)\\1{1,} to capture repeated values of my string, but this regex only capture consecutive repeated strings; I'm stuck to improve my regex to capture all repeated values, below some examples:

import re

str = 'capitals'

re.search(r'(\w)\1{1,}', str)

Output None

import re

str = 'butterfly'

re.search(r'(\w)\1{1,}', str)

<_sre.SRE_Match object; span=(2, 4), match='tt'>

I would use r'(\\w).*\\1 so that it allows any repeated character even if there are special characters or spaces in between.

However this wont work for strings with repeated characters overlapping the contents of groups like the string abcdabcd , in which it only recognizes the first group, ignoring the other repeated characters enclosed in the first group (b,c,d)

Check the demo: https://regex101.com/r/m5UfAe/1

So an alternative (and depending on your needs) is to sort the string analyzed:

import re
str = 'abcdabcde'
re.findall(r'(\w).*\1', ''.join(sorted(str)))

returning the array with the repeated characters ['a','b','c','d']

Hope the code below will help you understand the Backreference concept of Python RegEx

There are two sets of information available in the given string str

  1. Employee Basic Info:

    • starting with @employeename and ends with employeename
    • eg: @daniel dxc chennai 45000 male daniel
  2. Employee designation

    • starting with %employeename then designation and ends with employeename%
    • eg: %daniel python developer daniel%
import re

#sample input

str="""
@daniel dxc chennai 45000 male daniel @henry infosys bengaluru 29000 male hobby- 
swimming henry
@raja zoho chennai 37000 male raja @ramu infosys bengaluru 99000 male hobby-badminton 
ramu
%daniel python developer daniel% %henry database admin henry%
%raja Testing lead raja% %ramu Manager ramu%
"""

#backreferencing employee name (\w+)  <----  \1
#----------------------------------------------
basic_info=re.findall(r'@+(\w+)(.*?)\1',str)
print(basic_info)

#(%) <-- \1  and (\w+) <--- \2 
#-------------------------------
designation=re.findall(r'(%)+(\w+)(.*?)\2\1',str)
print(designation)

for i in range(len(designation)):
    designation[i]=(designation[i][1],designation[i][2])
print(designation)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM