Operating On re.findall()

Question

I was wondering if there was a way to do this better? I'd like to transform each object found into a string as I find it versus finding the whole list and then transforming each item in the list:

aList = regexObj.findall(s.text) if regexObj.findall(s.text) else None

self._menuUrls = map( lambda x: str( 'https://....' + x + '?otherparams=...' ), aList )

Is there a pre-made method I could use to do this in one pass or would this require that I create a separate method/lambda? Could I be more efficient in how I approach this?

EDIT: I did my own research into several methods with a file containing 500k matchable instances and found that list comprehension with re.findall() is 40-50% faster than list comprehension using re.finditer() in transforming an object as you search for an item.

menuUrls = []

start = time.time()

regex = re.compile("javascript:iframeLink\('([^']+)'\);")

#My Original Solution = 0.78200006485
menuUrls = map( lambda x: str('http://...' + x + '?param=...'), regex.findall(str(lines)))

#My Revised Solution = 0.619000196457
menuUrls = [ str('http://...' + x + '?param=...') for x in regex.findall(str(lines)) ]

#Friend's Proposal = 0.802000045776
for m in regex.finditer(str(lines)):
    menuUrls.append(str('http://...' + m.group(1) + '?param=...'))

#Stack Proposal = 0.912000179291
menuUrls = [ str('http://...' + x.group(0) + '?param=...') for x in regex.finditer(str(lines)) ]

set(menuUrls)

print time.time() - start

Answer 1

You are looking for re.finditer . Something like:

regex_iter = regexObj.finditer(s.text)
self._menuUrls = ['https://....' + x.group(0) + '?otherparams=...' for x in regex_iter]

This is marginal, but generally, a list comprehension will be faster than map with a lambda (indeed, than map with any other non-builtin function).

Demonstrations:

>>> import re
>>> text = "1 234 6 889 33 5 777 dff hd ae 2  ggre 777 fdf"
>>> pattern = re.compile(r"\d+")
>>> nums = ['<'+ m.group(0) + '>' for m in pattern.finditer(text)]
>>> nums
['<1>', '<234>', '<6>', '<889>', '<33>', '<5>', '<777>', '<2>', '<777>']
>>>

Answer 2

menuUrls = []

start = time.time()

regex = re.compile("javascript:iframeLink\('([^']+)'\);")

#My Original Solution = 0.78200006485
menuUrls = map( lambda x: str('http://...' + x + '?param=...'), regex.findall(str(lines)))

#My Revised Solution = 0.619000196457
menuUrls = [ str('http://...' + x + '?param=...') for x in regex.findall(str(lines)) ]

#Friend's Proposal = 0.802000045776
for m in regex.finditer(str(lines)):
    menuUrls.append(str('http://...' + m.group(1) + '?param=...'))

#Stack Proposal = 0.912000179291
menuUrls = [ str('http://...' + x.group(0) + '?param=...') for x in regex.finditer(str(lines)) ]

set(menuUrls)

print time.time() - start

The list comprehension of regex.findall() is tested to be the fastest search and transform function of the suggested solutions

Operating On re.findall()

Question

2 answers

solution1
0 2016-12-13 23:26:21

solution2
0 ACCPTED 2016-12-15 03:01:51

Operating On re.findall()

Question

2 answers

solution1 0 2016-12-13 23:26:21

solution2 0 ACCPTED 2016-12-15 03:01:51

solution1
0 2016-12-13 23:26:21

solution2
0 ACCPTED 2016-12-15 03:01:51