简体   繁体   中英

Speeding Up Python Code Time

start = time.time()
import csv
f = open('Speed_Test.csv','r+')
coordReader = csv.reader(f, delimiter = ',')
count = -1
successful_trip = 0
trips = 0

for line in coordReader:
    successful_single = 0
    count += 1
    R = interval*0.30
    if count == 0:
        continue
    if 26 < float(line[0]) < 48.7537144 and 26 < float(line[2]) <   48.7537144 and -124.6521017 < float(line[1]) < -68 and -124.6521017 < float(line[3]) < -68:
        y2,x2,y1,x1 = convertCoordinates(float(line[0]),float(line[1]),float(line[2]),float(line[3]))
        coords_line,interval = main(y1,x1,y2,x2)

        for item in coords_line:
            loop_count = 0
            r = 0
            min_dist = 10000

            for i in range(len(df)):
                dist = math.sqrt((item[1]-df.iloc[i,0])**2 + (item[0]-df.iloc[i,1])**2)
                if dist < R:
                    loop_count += 1
                    if dist < min_dist:
                        min_dist = dist
                        r = i
            if loop_count != 0:
                successful_single += 1
                df.iloc[r,2] += 1

        trips += 1
        if successful_single == (len(coords_line)):
            successful_trip += 1

end = time.time()
print('Percent Successful:',successful_trip/trips)
print((end - start))

I have this code and explaining it would be extremely time consuming but it doesn't run as fast as I need it to in order to be able to compute as much as I'd like. Is there anything anyone sees off the bat that I could do to speed the process up? Any suggestions would be greatly appreciated.

In essence it reads in 2 lat and long coordinates and changes them to a cartesian coordinate and then goes through every coordinate along the path from on origin coordinate to the destination coordinate in certain interval lengths depending on distance. As it is doing this though there is a data frame (df) with 300+ coordinate locations that it checks against each one of the trips intervals and sees if one is within radius R and then stores the shortest on.

Take advantage of any opportunity to break out of a for loop once the result is known. For example, at the end of the for line loop you check to see if successful_single == len(coords_line) . But that will happen any time the statement if loop_count != 0 is False, because at that point successful_single will not get incremented; you know that its value will never reach len(coords_line) . So you could break out of the for item loop right there - you already know it's not a "successful_trip." There may be other situations like this.

have you considered pooling and running these calculations in parallel ?

https://docs.python.org/2/library/multiprocessing.html

Your code also suggests the variable R,interval might create a dependency and requires a linear solution

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM