简体   繁体   中英

python regex to match a specific pattern

I need a regex to match patterns like:

'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'

my understanding is that these anomalies are in the format: 'a aa aa a, a aa a' and if the word only has three letters then it would be 'a aa', the abovementioned are just some examples and there are a lot more words that have this weird spacing issue.

can someone help me with this? the goal is to match these patterns and remove those spaces and make them normal words. Thank you in advance.

We can try using re.sub here along with a callback function:

inp = 'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'
output = re.sub(r'\w(?: \w{2})*(?: \w{1,2})?', lambda m: m.group().replace(' ', ''), inp)
print(output)  # Responsibilities, skills, required, sap

The strategy here is to match every x xx xx or y yy y term and then strip away spaces in the callback.

I'm not sure I understand your problem correctly, but try this:

>>> import re
>>> text = 'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'
>>> re.sub(r'(\w)((?: \w\w)+)( \w?\w?)?,?', 
>>>     lambda match: (match[1]+match[2]+(match[3] if match[3] else '')
>>> ).replace(' ', ''), text)
'Responsibilities skills required sap'

you can test regex at: https://regex101.com/r/mCjcNQ/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM