简体   繁体   中英

Looking for a good way to split a string on all-capital words

For example I have an arbitrary string:

var = 'I have a string I want GE and APPLES but nothing else'

What's the best way to split the string in python so that I can obtain just 'GE' and 'APPLES' . In Java I'd split on spaces and then check each array element for two or more consecutive letters and grab the ones that do.

Is there a better way to do it in Python, I'm not particularly well versed in Python's regex?

Using str.isupper , str.split and a list comprehension:

>>> var = 'I have a string I want GE and APPLES but nothing else'
>>> [x for x in var.split() if x.isupper() and len(x) > 1 ]
['GE', 'APPLES']

Using regex:

>>> import re
>>> re.findall(r'\b[A-Z]{2,}\b', var)
['GE', 'APPLES']

Timing comparison:

>>> var = 'I have a string I want GE and APPLES but nothing else'*10**5
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 773 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 491 ms per loop

#Input with huge words:

>>> var = ' '.join(['FOO'*1000, 'bar'*1000, 'SPAM'*1000]*1000)
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 224 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 483 ms per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM