Using regex to find all phrases that are completely capitalized

Question

I want to use regex to match with all substrings that are completely capitalized, included the spaces.

Right now I am using regexp: \\w*[AZ]\\s]

HERE IS Test WHAT ARE WE SAYING

Which returns:

HERE
IS
WHAT
ARE 
WE
SAYING

However, I would like it to match with all substrings that are allcaps, so that it returns:

HERE IS 
WHAT ARE WE SAYING

Answer 1

You can use word boundaries \\b and [^\\s] to prevent starting and ending spaces. Put together it might look a little like:

import re
string = "HERE IS Test WHAT ARE WE SAYING is that OKAY"

matches = re.compile(r"\b[^\s][A-Z\s]+[^\s]\b")
matches.findall(string)

>>> ['HERE IS', 'WHAT ARE WE SAYING', 'OKAY']

Answer 2

One option is to use re.split with the pattern \\s*(?:\\w*[^AZ\\s]\\w*\\s*)+ :

input = "HERE IS Test WHAT ARE WE SAYING"
parts = re.split('\s*(?:\w*[^A-Z\s]\w*\s*)+', input)
print(parts);

['HERE IS', 'WHAT ARE WE SAYING']

The idea here is to split on any sequential cluster of words which contains one or more letter which is not uppercase.

Answer 3

You could use findall :

import re

text = 'HERE IS Test WHAT ARE WE SAYING'
print(re.findall('[\sA-Z]+(?![a-z])', text))

Output

['HERE IS ', ' WHAT ARE WE SAYING']

The pattern [\\sA-Z]+(?![az]) matches any space or capitalized letter, that is not followed by a non-capitalized letter. The notation (?![az]) is known as a negative lookahead (see Regular Expression Syntax ).

Answer 4

You can use [AZ ]+ to match capital letters and spaces, and use negative lookahead (?! ) and negative lookbehind (?<! ) to forbid the first and last character from being a space.

Finally, surrounding the pattern with \\b to match word boundaries will make it only match full words.

import re
text = "A ab ABC ABC abc Abc aBc abC C"
pattern = r'\b(?! )[A-Z ]+(?<! )\b'

re.findall(pattern, text)
>>> ['A', 'ABC ABC', 'C']

Answer 5

You can also use the following method:

>>> import re
>>> s = 'HERE IS Test WHAT ARE WE SAYING'
>>> print(re.findall('((?!\s+)[A-Z\s]+(?![a-z]+))', s))

OUTPUT:

['HERE IS ', 'WHAT ARE WE SAYING']

Answer 6

Using findall() without matching leading and trailing spaces:

re.findall(r"\b[A-Z]+(?:\s+[A-Z]+)*\b",s)                                                                            
Out: ['HERE IS', 'WHAT ARE WE SAYING']

Using regex to find all phrases that are completely capitalized

Question

6 answers

solution1
2 ACCPTED 2018-11-28 02:20:24

solution2
0 2018-11-28 02:06:36

solution3
0 2018-11-28 02:13:33

solution4
0 2018-11-28 04:48:06

solution5
0 2018-11-28 08:02:06

solution6
0 2018-11-28 09:05:40

Using regex to find all phrases that are completely capitalized

Question

6 answers

solution1 2 ACCPTED 2018-11-28 02:20:24

solution2 0 2018-11-28 02:06:36

solution3 0 2018-11-28 02:13:33

solution4 0 2018-11-28 04:48:06

solution5 0 2018-11-28 08:02:06

solution6 0 2018-11-28 09:05:40

solution1
2 ACCPTED 2018-11-28 02:20:24

solution2
0 2018-11-28 02:06:36

solution3
0 2018-11-28 02:13:33

solution4
0 2018-11-28 04:48:06

solution5
0 2018-11-28 08:02:06

solution6
0 2018-11-28 09:05:40