I'm doing something like this to sum up a number of elements of a line:
for line in open(filename, 'r'):
big_list = line.strip().split(delim)
a = sum(int(float(item)) for item in big_list[start:end] if item)
# do some other stuff
this is done line by line with a big file, where some items may be missing, ie, equal to ''. If I use the statement above to compute a, the script becomes much slower than without it. Is there a way to speed it up?
As Padraic commented, use filter to trim out empty strings, then drop "if item":
>>> import timeit
>>> timeit.timeit("sum(int(float(item)) for item in ['','3.4','','','1.0'] if item)",number=10000)
0.04612559381553183
>>> timeit.timeit("sum(int(float(item)) for item in filter(None, ['','3.4','','','1.0']))",number=10000)
0.04827789913997549
>>> sum(int(float(item)) for item in filter(None, ['','3.4','','','1.0']))
4
>>>
Counterproductive in this example, but might reduce in your context. Measure to see.
This isn't tested, but intuitively I would expect skipping the intermediary float conversion would be helpful. You want to grab the integer to the left of the decimal, so I would try doing that directly via regular expression:
import re
pattern = re.compile("\d+")
Then replace the float parsing with the regex match:
sum(int(pattern.search(item).group(0)) for item in big_list[start:end] if item)
If you don't need to keep the old decimal strings, you could also get these on the fly as you build big_list
. For example, say we have the line "6.0,,1.2,3.0,"
. We could get matches like this:
delim = ","
pattern = re.compile("(\d+)\.\d+|" + re.escape(delim) + re.escape(delim) + "|$")
The results of this pattern on the line would be: ['6', '', '1', '3', '']
, which could then be sliced and filtered as usual without the need of float parsing:
for line in open(filename, 'r'):
big_list = pattern.findall(line)
a = sum(int(item) for item in big_list[start:end] if item)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.