For example I have an arbitrary string:
var = 'I have a string I want GE and APPLES but nothing else'
What's the best way to split the string in python so that I can obtain just 'GE'
and 'APPLES'
. In Java I'd split on spaces and then check each array element for two or more consecutive letters and grab the ones that do.
Is there a better way to do it in Python, I'm not particularly well versed in Python's regex?
Using str.isupper
, str.split
and a list comprehension:
>>> var = 'I have a string I want GE and APPLES but nothing else'
>>> [x for x in var.split() if x.isupper() and len(x) > 1 ]
['GE', 'APPLES']
Using regex:
>>> import re
>>> re.findall(r'\b[A-Z]{2,}\b', var)
['GE', 'APPLES']
Timing comparison:
>>> var = 'I have a string I want GE and APPLES but nothing else'*10**5
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 773 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 491 ms per loop
#Input with huge words:
>>> var = ' '.join(['FOO'*1000, 'bar'*1000, 'SPAM'*1000]*1000)
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 224 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 483 ms per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.