简体   繁体   English

python 正则表达式匹配特定模式

[英]python regex to match a specific pattern

I need a regex to match patterns like:我需要一个正则表达式来匹配以下模式:

'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap' 'R es ponns ib il it ie s, s kill s, r eq ui re d, s ap'

my understanding is that these anomalies are in the format: 'a aa aa a, a aa a' and if the word only has three letters then it would be 'a aa', the abovementioned are just some examples and there are a lot more words that have this weird spacing issue.我的理解是这些异常的格式是:'a aaa aa, aaa a',如果这个词只有三个字母,那就是'a aa',上面提到的只是一些例子,还有更多有这个奇怪的间距问题的单词。

can someone help me with this?有人可以帮我弄这个吗? the goal is to match these patterns and remove those spaces and make them normal words.目标是匹配这些模式并删除这些空格并使它们成为正常单词。 Thank you in advance.先感谢您。

We can try using re.sub here along with a callback function:我们可以在这里尝试使用re.sub以及回调 function:

inp = 'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'
output = re.sub(r'\w(?: \w{2})*(?: \w{1,2})?', lambda m: m.group().replace(' ', ''), inp)
print(output)  # Responsibilities, skills, required, sap

The strategy here is to match every x xx xx or y yy y term and then strip away spaces in the callback.这里的策略是匹配每个x xx xxy yy y项,然后去掉回调中的空格。

I'm not sure I understand your problem correctly, but try this:我不确定我是否正确理解您的问题,但试试这个:

>>> import re
>>> text = 'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'
>>> re.sub(r'(\w)((?: \w\w)+)( \w?\w?)?,?', 
>>>     lambda match: (match[1]+match[2]+(match[3] if match[3] else '')
>>> ).replace(' ', ''), text)
'Responsibilities skills required sap'

you can test regex at: https://regex101.com/r/mCjcNQ/1您可以在以下位置测试正则表达式: https://regex101.com/r/mCjcNQ/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM