Trying to convert a huge amount of records (time series) to int like this:
seconds_time = int(time.mktime(time.strptime(parts[0], '%Y%m%d %H%M%S')))
Unfortunately, this is the code's bottle neck (time-consuming increases by factor of about 20 ). Any suggestions to improve it?
Thanks in advance
Actually there's a way to drastically reduce parsing time.
import time
start = time.time()
nb_loops = 1000000
time_string = "20170101 201456"
for i in range(nb_loops):
seconds_time = int(time.mktime(time.strptime(time_string, '%Y%m%d %H%M%S')))
print(time.time()-start)
that first loop runs in 12 seconds. Not very good I admit.
But , since your format is simple, why not use integer conversion with slicing in a list comprehension (and add 0 for the missing fields like milliseconds, ...) and pass the result to mktime
.
start = time.time()
for i in range(nb_loops):
seconds_time = time.mktime(tuple([int(time_string[s:e]) for s,e in ((0,4),(4,6),(6,8),(9,11),(11,13),(13,15))]+[0,0,0]))
print(time.time()-start)
that runs in 3 seconds (saves the parsing of the '%Y%m%d %H%M%S'
format string, which seems to take a while).
Using compiled regular expressions is slightly faster:
import re
r = re.compile("(....)(..)(..) (..)(..)(..)")
start = time.time()
for i in range(nb_loops):
seconds_time = time.mktime(tuple(map(int,r.match(time_string).groups()))+(0,0,0))
print(time.time()-start)
results:
basic 14.41410493850708
string slicing 3.1356000900268555
regex 2.8703999519348145
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.