Regular expression to find a series of uppercase words in a string

Question

text = "This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE."

pattern = '[A-Z]+[A-Z]+[A-Z]*[\s]+'

re.findall(pattern, text) gives an output -->

['TEXT ', 'CONTAINING ', 'UPPER ', 'CASE ', 'WORDS ', 'SECOND ', 'SENTENCE ']

However, I want an output something like this -->

['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']

Answer 1

You may use this regex:

\b[A-Z]+(?:\s+[A-Z]+)*\b

RegEx Demo

RegEx Details:

\\b : Word boundary
[AZ]+ : Match a word comprising only uppercase letters
(?:\\s+[AZ]+)* : Match 1+ whitespace followed by another word with uppercase letters. Match this group 0 or more times
\\b : Word boundary

Code:

>>> s = 'This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE';
>>> print (re.findall(r'\b[A-Z]+(?:\s+[A-Z]+)*\b', s))
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']

Answer 2

Improving regex, you want at least 2 uppercase letter, so use the dedicated syntax {2,} for 2 or more , and use word boundary to be sure to catch the whole word
```
r'\\b[AZ]{2,}\\b'
```

Do the job for each sentence : find them with a basic regex, and for each sentence, look for the uppercase words, then save them in an array by joining with a space

result = [] sentences = re.findall("[^.]+.", text) for sentence in sentences: uppercase = re.findall(pattern, sentence) result.append(" ".join(uppercase)) print(result) # ['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']

In a list-comprehension, it looks like

res = [" ".join(re.findall(pattern, sentence)) for sentence in re.findall("[^.]+.", text)]

Regular expression to find a series of uppercase words in a string

Question

2 answers

solution1
1 ACCPTED 2020-03-18 11:09:58

solution2
0 2020-03-18 11:11:13

Regular expression to find a series of uppercase words in a string

Question

2 answers

solution1 1 ACCPTED 2020-03-18 11:09:58

solution2 0 2020-03-18 11:11:13

solution1
1 ACCPTED 2020-03-18 11:09:58

solution2
0 2020-03-18 11:11:13