简体   繁体   中英

Odd threading behavior in python

I have a problem where I need to pass the index of an array to a function which I define inline. The function then gets passed as a parameter to another function which will eventually call it as a callback.

The thing is, when the code gets called, the value of the index is all wrong. I eventually solved this by creating an ugly workaround but I am interested in understanding what is happening here. I created a minimal example to demonstrate the problem:

from __future__ import print_function
import threading


def works_as_expected():
    for i in range(10):
        run_in_thread(lambda: print('the number is: {}'.format(i)))

def not_as_expected():
    for i in range(10):
        run_later_in_thread(lambda: print('the number is: {}'.format(i)))

def run_in_thread(f):
    threading.Thread(target=f).start()

threads_to_run_later = []
def run_later_in_thread(f):
    threads_to_run_later.append(threading.Thread(target=f))


print('this works as expected:\n')
works_as_expected()

print('\nthis does not work as expected:\n')
not_as_expected()
for t in threads_to_run_later: t.start()

Here is the output:

this works as expected:

the number is: 0
the number is: 1
the number is: 2
the number is: 3
the number is: 4
the number is: 6
the number is: 7
the number is: 7
the number is: 8
the number is: 9

this does not work as expected:

the number is: 9
the number is: 9
the number is: 9
the number is: 9
the number is: 9
the number is: 9
the number is: 9
the number is: 9
the number is: 9
the number is: 9

Can someone explain what is happening here? I assume it has to do with enclosing scope or something, but an answer with a reference that explains this dark (to me) corner of python scoping would be valuable to me.

I'm running this on python 2.7.11

This is a result of how closures and scopes work in python.

What is happening is that i is bound within the scope of the not_as_expected function. So even though you're feeding a lambda function to the thread, the variable it's using is being shared between each lambda and each thread.

Consider this example:

def make_function():
    i = 1
    def inside_function():
        print i
    i = 2
    return inside_function

f = make_function()
f()

What number do you think it will print? The i = 1 before the function was defined or the i = 2 after?

It's going to print the current value of i (ie 2 ). It doesn't matter what the value of i was when the function was made, it's always going to use the current value. The same thing is happening with your lambda functions.

Even in your expected results you can see it didn't always work right, it skipped 5 and displayed 7 twice. What is happening in that case is that each lambda is usually running before the loop gets to the next iteration. But in some cases (like the 5 ) the loop manages to get through two iterations before control is passed to one of the other threads, and i increments twice and a number is skipped. In other cases (like the 7 ) two threads manage to run while the loop is still in the same iteration and since i doesn't change between the two threads, the same value gets printed.

If you instead did this:

def function_maker(i):
    return lambda: print('the number is: {}'.format(i))

def not_as_expected():
    for i in range(10):
        run_later_in_thread(function_maker(i))

The i variable gets bound inside function_maker along with the lambda function. Each lambda function will be referencing a different variable, and it will work as expected.

A closure in Python captures the free variables , not their current values at the time of the creation of the closure. For example:

def make_closures():
    L = []

    # Captures variable L
    def push(x):
        L.append(x)
        return len(L)

    # Captures the same variable
    def pop():
        return L.pop()

    return push, pop

pushA, popA = make_closures()
pushB, popB = make_closures()

pushA(10); pushB(20); pushA(30); pushB(40)
print(popA(), popA(), popB(), popB())

will display 30, 10, 40, 20: this happens because the first pair of closures pushA , popA will refer to one list L and the second pair pushB , popB will refer to another independent list.

The important point is that in each pair push and pop closures refer to the same list, ie they captured the variable L and not the value of L at the time of creation. If L is mutated by one closure the other will see the changes.

One common mistake is for example to expect that

L = []
for i in range(10):
    L.append(lambda : i)
for x in L:
    print(x())

will display the numbers from 0 to 9... all of the unnamed closures here captured the same variable i used to loop and all of them will return the same value when called.

The common Python idiom to solve this problem is

L.append(lambda i=i: i)

ie using the fact that default values for parameters are evaluated at the time the function is created. With this approach each closure will return a different value because they're returning their private local variable (a parameter that has a default).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM