[英]Matching words in uppercase with Regular expression on a text with multiple lines
[英]Regular expression to find a series of uppercase words in a string
text = "This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE."
pattern = '[A-Z]+[A-Z]+[A-Z]*[\s]+'
re.findall(pattern, text)
給出一個輸出 -->
['TEXT ', 'CONTAINING ', 'UPPER ', 'CASE ', 'WORDS ', 'SECOND ', 'SENTENCE ']
但是,我想要這樣的輸出-->
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
您可以使用此正則表達式:
\b[A-Z]+(?:\s+[A-Z]+)*\b
正則表達式詳情:
\\b
: 字邊界[AZ]+
: 匹配一個只包含大寫字母的單詞(?:\\s+[AZ]+)*
: 匹配 1+ 個空格后跟另一個大寫字母的單詞。 匹配該組 0 次或多次\\b
: 字邊界代碼:
>>> s = 'This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE';
>>> print (re.findall(r'\b[A-Z]+(?:\s+[A-Z]+)*\b', s))
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
改進正則表達式,你至少需要 2 個大寫字母,所以使用專用語法{2,}
for 2 or more ,並使用單詞邊界確保捕獲整個單詞
r'\\b[AZ]{2,}\\b'
為每個句子完成工作:使用基本的正則表達式找到它們,並為每個句子查找大寫單詞,然后通過加入空格將它們保存在數組中
result = [] sentences = re.findall("[^.]+.", text) for sentence in sentences: uppercase = re.findall(pattern, sentence) result.append(" ".join(uppercase)) print(result) # ['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
在列表理解中,它看起來像
res = [" ".join(re.findall(pattern, sentence)) for sentence in re.findall("[^.]+.", text)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.