這是Python代碼使用多線程的安全方法嗎？

Question

我用於圖形處理的應用程序具有嵌入式Python解釋器-除了有一些特殊對象外，它的工作原理與任何其他Python解釋器完全相同。

基本上，我試圖使用Python下載一堆圖像並進行其他網絡和磁盤I / O。 如果我在沒有多線程的情況下執行此操作，則我的應用程序將凍結（即視頻停止播放），直到下載完成。

為了解決這個問題，我試圖使用多線程。 但是，我無法觸及任何主要過程。

我已經寫了這段代碼。 該程序唯一的唯一部分被注釋。 me.store / me.fetch基本上是獲取全局變量的一種方法。 op('files')引用全局表。

這是兩件事，“在主過程中”只能以線程安全的方式進行觸摸。 我不確定我的代碼是否這樣做。

對於為什么或（為什么不）此代碼是線程安全的，以及如何以線程安全的方式訪問全局變量，我將不勝感激。

我擔心的一件事是，許多線程如何多次獲取counter 。 由於僅在寫入文件后才對其進行更新，因此這可能導致爭用情況，其中不同的線程將使用相同的值訪問計數器（然后無法正確存儲遞增的值）。 或者，如果磁盤寫入失敗，計數器將如何處理。

from urllib import request
import threading, queue, os

url = 'http://users.dialogfeed.com/en/snippet/dialogfeed-social-wall-twitter-instagram.json?api_key=ac77f8f99310758c70ee9f7a89529023'

imgs = [
    'http://search.it.online.fr/jpgs/placeholder-hollywood.jpg.jpg',
    'http://www.lpkfusa.com/Images/placeholder.jpg',
    'http://bi1x.caltech.edu/2015/_images/embryogenesis_placeholder.jpg'
]

def get_pic(url):
    # Fetch image data
    data = request.urlopen(url).read()
    # This is the part I am concerned about, what if multiple threads fetch the counter before it is updated below
    # What happens if the file write fails?
    counter = me.fetch('count', 0)

    # Download the file
    with open(str(counter) + '.jpg', 'wb') as outfile:
        outfile.write(data)
        file_name = 'file_' + str(counter)
        path = os.getcwd() + '\\' + str(counter) + '.jpg'
        me.store('count', counter + 1)
        return file_name, path


def get_url(q, results):
    url = q.get_nowait()
    file_name, path = get_pic(url)
    results.append([file_name, path])
    q.task_done()

def fetch():
    # Clear the table
    op('files').clear()
    results = []
    url_q = queue.Queue()
    # Simulate getting a JSON feed
    print(request.urlopen(url).read().decode('utf-8'))

    for img in imgs:
        # Add url to queue and start a thread
        url_q.put(img)
        t = threading.Thread(target=get_url, args=(url_q, results,))
        t.start()

    # Wait for threads to finish before updating table
    url_q.join()
    for cell in results:
        op('files').appendRow(cell)
    return

# Start a thread so that the first http get doesn't block
thread = threading.Thread(target=fetch) 
thread.start()

Answer 1

您的代碼似乎根本不安全。 關鍵點：

追加results是不安全的-兩個線程可能會嘗試同時追加到列表中。
訪問和設置counter是不安全的-在另一個線程設置新counter值之前，該線程是我的獲取counter 。
傳遞url隊列是多余的-只需將新的url傳遞給每個作業即可。

另一種方式（ `concurrent.futures` ）

由於您使用的是python 3，為什么不使用parallel.futures模塊，這使您的任務更易於管理。 下面，我以不需要顯式同步的方式寫出了您的代碼-所有工作都由Futures模塊處理。

from urllib import request
import os
import threading

from concurrent.futures import ThreadPoolExecutor
from itertools import count

url = 'http://users.dialogfeed.com/en/snippet/dialogfeed-social-wall-twitter-instagram.json?api_key=ac77f8f99310758c70ee9f7a89529023'

imgs = [
    'http://search.it.online.fr/jpgs/placeholder-hollywood.jpg.jpg',
    'http://www.lpkfusa.com/Images/placeholder.jpg',
    'http://bi1x.caltech.edu/2015/_images/embryogenesis_placeholder.jpg'
]

def get_pic(url, counter):
    # Fetch image data
    data = request.urlopen(url).read()

    # Download the file
    with open(str(counter) + '.jpg', 'wb') as outfile:
        outfile.write(data)
        file_name = 'file_' + str(counter)
        path = os.getcwd() + '\\' + str(counter) + '.jpg'
        return file_name, path

def fetch():
    # Clear the table
    op('files').clear()

    with ThreadPoolExecutor(max_workers=2) as executor:
        count_start = me.fetch('count', 0)
        # reserve these numbers for our tasks
        me.store('count', count_start + len(imgs))
        # separate fetching and storing is usually not thread safe
        # however, if only one thread modifies count (the one running fetch) then 
        # this will be safe (same goes for the files variable)

        for cell in executor.map(get_pic, imgs, count(count_start)):
            op('files').appendRow(cell)


# Start a thread so that the first http get doesn't block
thread = threading.Thread(target=fetch) 
thread.start()

如果有多個線程修改計數，則在修改計數時應使用鎖。

例如。

lock = threading.Lock()

def fetch():
    ...
    with lock:
        # Do not release the lock between accessing and modifying count.
        # Other threads wanting to modify count, must use the same lock object (not 
        # another instance of Lock).
        count_start = me.fetch('count', 0)
        me.store('count', count_start + len(imgs))    
   # use count_start here

唯一的問題是，如果一項作業由於某種原因失敗，那么您將得到一個丟失的文件編號。 任何引發的異常都會通過在此處重新引發異常來中斷執行程序執行映射的過程，因此您可以根據需要執行一些操作。

您可以通過使用tempfile模塊來避免在使用計數器之前找到臨時存儲文件的位置，然后再將文件移動到永久位置。

Answer 2

如果您不熟悉python多線程技術，請記住要看一下multiprocessing和threading 。

您的代碼看起來還不錯，盡管代碼風格不是很容易閱讀。 您需要運行它以查看它是否按預期工作。

with將確保你的鎖被釋放。 當進入該塊時，將調用Acquisition（）方法，而當退出該塊時，將調用release（）。

如果添加更多線程，請確保它們沒有使用來自隊列的相同地址並且沒有爭用條件（似乎是通過Queue.get()完成的，但是您需要運行它進行驗證）。 請記住，每個線程共享相同的進程，因此幾乎所有內容都共享。 您不希望兩個線程處理相同的address

Answer 3

Lock根本不執行任何操作。 您只有一個線程曾經調用過download_job這就是您分配給my_thread 。 另一個線程，即主線程，調用offToOn ，並在到達該函數末尾時立即完成。 因此，沒有第二個線程試圖獲取鎖，因此也沒有第二個線程被阻塞。 您提到的表顯然在顯式打開和關閉的文件中。 如果操作系統保護此文件以防止同時從其他程序訪問該文件，則可以避免使用該文件。 否則絕對不安全，因為您尚未完成任何線程同步。

線程之間正確的同步要求不同的線程可以訪問SAME鎖； 即一個鎖被多個線程訪問。 另請注意，“線程”不是“進程”的同義詞。 Python同時支持。 如果確實要避免訪問主流程，則必須使用多處理模塊來啟動和管理第二個流程。

而且此代碼將永遠不會退出，因為始終有一個線程在無限循環中運行（在threader ）。

以線程安全的方式訪問資源需要執行以下操作：

a_lock = Lock()
def use_resource():
    with a_lock:
        # do something

在使用鎖的功能之外，僅創建一次鎖。 無論是從哪個線程訪問整個應用程序中的資源，都必須通過調用use_resource或等效方法來獲取相同的鎖。

這是Python代碼使用多線程的安全方法嗎？

問題描述

3 個解決方案

解決方案1
1 2015-05-19 22:47:14

另一種方式（ `concurrent.futures` ）

解決方案2
0 2015-05-19 01:58:38

解決方案3
0 2015-05-19 05:41:35

這是Python代碼使用多線程的安全方法嗎？

問題描述

3 個解決方案

解決方案1 1 2015-05-19 22:47:14

另一種方式（ concurrent.futures ）

解決方案2 0 2015-05-19 01:58:38

解決方案3 0 2015-05-19 05:41:35

解決方案1
1 2015-05-19 22:47:14

另一種方式（ `concurrent.futures` ）

解決方案2
0 2015-05-19 01:58:38

解決方案3
0 2015-05-19 05:41:35