Looking for a good way to split a string on all-capital words

Question

For example I have an arbitrary string:

var = 'I have a string I want GE and APPLES but nothing else'

What's the best way to split the string in python so that I can obtain just 'GE' and 'APPLES' . In Java I'd split on spaces and then check each array element for two or more consecutive letters and grab the ones that do.

Is there a better way to do it in Python, I'm not particularly well versed in Python's regex?

Answer 1

Using str.isupper , str.split and a list comprehension:

>>> var = 'I have a string I want GE and APPLES but nothing else'
>>> [x for x in var.split() if x.isupper() and len(x) > 1 ]
['GE', 'APPLES']

Using regex:

>>> import re
>>> re.findall(r'\b[A-Z]{2,}\b', var)
['GE', 'APPLES']

Timing comparison:

>>> var = 'I have a string I want GE and APPLES but nothing else'*10**5
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 773 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 491 ms per loop

#Input with huge words:

>>> var = ' '.join(['FOO'*1000, 'bar'*1000, 'SPAM'*1000]*1000)
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 224 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 483 ms per loop

Looking for a good way to split a string on all-capital words

Question

1 answers

solution1
3 ACCPTED 2013-12-06 22:24:38

Looking for a good way to split a string on all-capital words

Question

1 answers

solution1 3 ACCPTED 2013-12-06 22:24:38

solution1
3 ACCPTED 2013-12-06 22:24:38