[英]Extract Only Whole Word That Has The First Letter Capitalized
I have a text file need to be analyzed here, what I am interested is only the whole word with the first letter capitalized,我这里有一个文本文件需要分析,我感兴趣的只是第一个字母大写的整个单词,
For example: test string: Everyday HOLDS the poSSibility Of A Miracle
例如:测试字符串: Everyday HOLDS the poSSibility Of A Miracle
I want to capture: Everyday Of A Miracle
我想捕捉: Everyday Of A Miracle
I am currently trying to build my regular expression in Python, strangely, my regex only can capture the first whole word that is captalized.我目前正在尝试在 Python 中构建我的正则表达式,奇怪的是,我的正则表达式只能捕获第一个大写的整个单词。
Test String: Everyday HOLDS the poSSibility Of A Miracle
测试字符串: Everyday HOLDS the poSSibility Of A Miracle
My regex: ^([AZ])?([az])+
我的正则表达式: ^([AZ])?([az])+
Capture: Everyday
捕获: Everyday
What am I missing here ?我在这里错过了什么?
Instead of anchoring the regex at the beginning of the string, utilize boundary checking:不是将正则表达式锚定在字符串的开头,而是利用边界检查:
import re
s = 'Everyday HOLDS the poSSibility Of A Miracle'
new_s = ' '.join(re.findall(r'\b[A-Z][a-z]+|\b[A-Z]\b', s))
Output:输出:
'Everyday Of A Miracle'
Without regex (only if words are delimited by whitespaces):没有正则表达式(仅当单词由空格分隔时):
>>> s='Everyday HOLDS the poSSibility Of A Miracle'
>>> [x for x in s.split() if x.title()==x]
['Everyday', 'Of', 'A', 'Miracle']
Note that you can also use re.split to split on any non-letter characters.请注意,您还可以使用 re.split 拆分任何非字母字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.