简体   繁体   中英

Load data in background thread with Python 3

I am a bit frustrated about not being able to solve this seemingly simple problem:

I have a function that takes some time to load data:

def import_data(id):
    time.sleep(5)
    return 'data' + str(id)

A DataModel class calls this function and manages two datasets.

class DataModel():
    def __init__(self):
        self._data_1 = import_data(1)
        self._data_2 = import_data(2)

    def retrieve_data_1(self):
        return self._data_1

    def retrieve_data_2(self):
        return self._data_2

Now, the main UI creates the DataModel , calling both import_data functions, which blocks it.

def main_ui():
    # This takes 5 seconds for each dataset and blocks the main UI thread
    dm = DataModel()

    # Other stuff is happening. This time could be used to load data in the background
    time.sleep(2)

    # Retrieve the first dataset
    data_1 = dm.retrieve_data_1()

    # User interaction. This time could be used to load even larger datasets
    time.sleep(10)

    # Retrieve the second dataset
    data_2 = dm.retrieve_data_2()

I want the datasets to be loaded in the background to reduce the time the UI is blocked. My idea would be to implement it like this pseudocode :

class DataModel():
    def __init__(self):
        self._data_1 = Thread(import_data(1)).start()
        self._data_2 = Thread(import_data(2)).start()

    def retrieve_data_1(self):
        return self._data_1.wait_for_result()

    def retrieve_data_2(self):
        return self._data_2.wait_for_result()

The import_data functions are called in separate threads and return Future objects. The retrieve_data functions either block the main thread waiting for the Future to evaluate or return its result instantly.

Is there an easy way to implement this in Python 3.x with threading and/or asyncio ? Thanks in advance!

(Edit: syntax correction)

Use the concurrent.futures module which is designed exactly for that kind of usage:

_pool = concurrent.futures.ThreadPoolExecutor()

class DataModel():
    def __init__(self):
        self._data_1 = _pool.submit(import_data, 1)
        self._data_2 = _pool.submit(import_data, 2)

    def retrieve_data_1(self):
        return self._data_1.result()

    def retrieve_data_2(self):
        return self._data_2.result()

If your functions are global, and your data serializable, you can even seamlessly switch from ThreadPoolExecutor to ProcessPoolExecutor and benefit from true (process-based) parallelism.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM