简体   繁体   中英

How do I increase variable value when multithreading in python

I am trying to make a webscraper with multithreading to make it faster. I want to make the value increase every execution. but sometimes the value is skipping or repeating on itself.

import threading
num = 0

def scan():
    while True:
        global num
        num += 1
        print(num)
        open('logs.txt','a').write(str(f'{num}\n'))

for x in range(500):
    threading.Thread(target=scan).start()

Result:

2
2
5
5
7
8
10
10
12
13
13
13
16
17
19
19
22
23
24
25
26
28
29
29
31
32
33
34

Expected result:

1
2
3
4
5
6
7
8
9
10

so since the variable num is a shared resource, you need to put a lock on it. This is done as follows:

num_lock = threading.Lock()

Everytime you want to update the shared variable, you need your thread to first acquire the lock. Once the lock is acquired, only that thread will have access to update the value of num, and no other thread will be able to do so while the current thread has acquired the lock.

Ensure that you use wait or a try-finally block while doing this, to guarantee that the lock will be released even if the current thread fails to update the shared variable.

Something like this:

num_lock.acquire()
try:
        num+=1
finally:
   num_lock.release()

using with :

 with num_lock:
   num+=1

Seems like a race condition. You could use a lock so that only one thread can get a particular number. It would make sense also to use lock for writing to the output file.

Here is an example with lock. You do not guarantee the order in which the output is written of course, but every item should be written exactly once. In this example I added a limit of 10000 so that you can more easily check that everything is written eventually in the test code, because otherwise at whatever point you interrupt it, it is harder to verify whether a number got skipped or it was just waiting for a lock to write the output.

The my_num is not shared, so you after you have already claimed it inside the with num_lock section, you are free to release that lock (which protects the shared num ) and then continue to use my_num outside of the with while other threads can access the lock to claim their own value. This minimises the duration of time that the lock is held.

import threading

num = 0
num_lock = threading.Lock()
file_lock = threading.Lock()    

def scan():
    global num_lock, file_lock, num
    
    while num < 10000:
        with num_lock:
            num += 1
            my_num = num

        # do whatever you want here using my_num
        # but do not touch num

        with file_lock:
            open('logs.txt','a').write(str(f'{my_num}\n'))
        
threads = [threading.Thread(target=scan) for _ in range(500)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

An important callout in addition to threading.Lock :

  • Use join to make the parent thread wait for forked threads to complete.
  • Without this, threads would still race.

Suppose I'm using the num after threads complete:

import threading

lock, num = threading.Lock(), 0


def operation():
    global num
    print("Operation has started")
    with lock:
        num += 1


threads = [threading.Thread(target=operation) for x in range(10)]
for t in threads:
    t.start()

for t in threads:
    t.join()

print(num)

Without join, inconsistent (9 gets printed once, 10 otherwise):

Operation has started
Operation has started
Operation has started
Operation has started
Operation has startedOperation has started

Operation has started
Operation has started
Operation has started
Operation has started9

With join, its consistent:

Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM