简体   繁体   中英

How to speed up a for loop?

I have this loop:

for s in sales:
    salezip = sales[s][1]
    salecount = sales[s][0]
    for d in deals:
        dealzip = deals[d][1]
        dealname = deals[d][0]
        for zips in ziplist:
            if salezip == zips[0] and dealzip == zips[1]:
                distance = zips[2]
                print "MATCH FOUND"
                if not salesdict.has_key(dealname):
                    salesdict[dealname] = [dealname,dealzip,salezip,salecount,distance]
                else:
                    salesdict[dealname][3] += salecount

And it is taking FOREVER to run. The sales dictionary has 13k entries, the deals dictionary has 1000 entries, and the ziplist has 1.8M entries. It is obviously very slow when it hits the ziplist part, I have it set to print "MATCH FOUND" when it successfully find a match, and it hasn't printed in over 20 minutes. What can I do to make this move quicker?

Purpose of the code:

Loop through sales data which contains the amount of apples sold and the location of the purchase, pull the location and quantity info. Then, loop through apple dealers, find their location and their name. Then, loop through the ziplist data which shows distance between zip codes, sorted in ascending order of distance. The second it finds a match of the sales zip and dealer zip, it adds them to a dictionary with all of their information.

Having ziplist as an actual list of (zip1, zip2, distance) is insane - you want a data structure where you can directly find the desired item, without having to loop through the entire data set.

A dictionary with (zip1, zip2) as the key, and the distance as the value, would be enormously faster. Note that you'd need to insert each distance under the key (zip2, zip1) as well, to handle lookups in the opposite direction. Alternatively, you could sort [zip1, zip2] into numeric order before using it as a key (both on insert and lookup), so that it doesn't matter which order they are specified in.

The best thing you can do is to reorganize your code so that you don't have to loop so many times, and you don't have to do as many look-ups. It looks to me like you're looping over ziplist 130k times as much as you really need to. Here are a couple ideas that might help:

First, create a way to quickly look up sale and deal information by zip rather than by name:

sale_by_zip = {sales[key][1]: sales[key] for key in sales}
deal_by_zip = {deals[key][1]: deals[key] for key in deals}

Then, make the iteration through the ziplist the only outer loop:

for zips in ziplist:
    salezip = zips[0]
    dealzip = zips[1]
    if salezip in sale_by_zip and dealzip in deal_by_zip:
        distance = zips[2]
        print "MATCH FOUND"
        dealname = deal_by_zip[dealzip][0]
        salecount = sale_by_zip[salezip][0]
        if not salesdict.has_key(dealname):
            salesdict[dealname] = [dealname,dealzip,salezip,salecount,distance]
        else:
            salesdict[dealname][3] += salecount

This should drastically reduce the amount of processing you need to do.

As others have noted, the structure of ziplist is also not the most well-suited to this problem. My suggestions assume ziplist is something you receive from an external source and cannot change the format of without making an extra pass over it. If you are building the ziplist yourself, however, consider something that would give you faster lookups.

The root of your problem is that you're processing the zip list multiple times - for every deal and then again for every sale.

One possibility is to reverse the order of your coding: start with the zips list, then the sales list, and finally the deals dictionary. If you're going to iterate through something multiple times, at least iterating through the smaller dictionary would be a lot faster.

if there aren't a lot matches, perhaps using "in" would be quicker, such as if dealzip in zips: and then process from then.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM