简体   繁体   中英

Creating a Python list comprehension with an if and break with nested for loops

I noticed from this answer that the code

for i in userInput:
    if i in wordsTask:
        a = i
        break

can be written as a list comprehension in the following way:

next([i for i in userInput if i in wordsTask])

I have a similar problem which is that I would like to write the following (simplified from original problem) code in terms of a list comprehension:

 for i in xrange(N):
     point = Point(long_list[i],lat_list[i])
     for feature in feature_list:
         polygon = shape(feature['geometry'])
         if polygon.contains(point):
             new_list.append(feature['properties'])
             break

I expect each point to be associated with a single polygon from the feature list. Hence, once a polygon that contains the point is found, break is used to move on to the next point. Therefore, new_list will have exactly N elements.

I wrote it as a list comprehension as follows:

new_list = [feature['properties'] for i in xrange(1000) for feature in feature_list if shape(feature['geometry']).contains(Point(long_list[i],lat_list[i])]

Of course, this doesn't take into account the break in the if statement, and therefore takes significantly longer than using nested for loops. Using the advice from the above-linked post (which I probably don't fully understand), I did

new_list2 = next(feature['properties'] for i in xrange(1000) for feature in feature_list if shape(feature['geometry']).contains(Point(long_list[i],lat_list[i]))

However, new_list2 has much fewer than N elements (in my case, N=1000 and new_list2 had only 5 elements)

Question 1: Is it even worth doing this as a list comprehension? The only reason is that I read that list comprehensions are usually a bit faster than nested for loops. With 2 million data points, every second counts.

Question 2: If so, how would I go about incorporating the break statement in a list comprehension?

Question 3: What was the error going on with using next in the way I was doing?

Thank you so much for your time and kind help.

List comprehensions are not necessarily faster than a for loop. If you have a pattern like:

some_var = []
for ...:
    if ...:
        some_var.append(some_other_var)

then yes, the list comprehension is faster than the bunch of .append() s. You have extenuating circumstances, however. For one thing, it is actually a generator expression in the case of next(...) because it doesn't have the [ and ] around it.

  • You aren't actually creating a list (and therefore not using .append() ). You are merely getting one value.
  • Your generator calls Point(long_list[i], lat_list[i]) once for each feature for each i in xrange(N) , whereas the loop calls it only once for each i .
  • and, of course, your generator expression doesn't work.

Why doesn't your generator expression work? Because it finds only the first value overall. The loop, on the other hand, finds the first value for each i . You see the difference? The generator expression breaks out of both loops, but the for loop breaks out of only the inner one.


If you want a slight improvement in performance, use itertools.izip() (or just zip() in Python 3):

from itertools import izip

for long, lat in izip(long_list, lat_list):
    point = Point(long, lat)
    ...

I don't know that complex list comprehensions or generator expressions are that much faster than nested loops if they're running the same algorithm (eg visiting the same number of values). To get a definitive answer you should probably try to implement a solution both ways and test to see which is faster for your real data.

As for how to short-circuit the inner loop but not the outer one, you'll need to put the next call inside the main list comprehension, with a separate generator expression inside of it:

new_list = [next(feature['properties'] for feature in feature_list
                                       if shape(feature['shape']).contains(Point(long, lat)))
            for long, lat in zip(long_list, lat_list)]

I've changed up one other thing: Rather than indexing long_list and lat_list with indexes from a range I'm using zip to iterate over them in parallel.

Note that if creating the Point objects over and over ends up taking too much time, you can streamline that part of the code by adding in another nested generator expression that creates the points and lets you bind them to a (reusable) name:

new_list = [next(feature['properties'] for feature in feature_list
                                       if shape(feature['shape']).contains(point))
            for point in (Point(long, lat) for long, lat in zip(long_list, lat_list))]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM