简体   繁体   中英

python: semaphore protection on a list or a dict in python class

my code

class myclass:
    def __init__(self):
        self.x = {}
        self.y = []
        self.semaphore = threading.Semaphore()
    def __semaphore(func):
        def wrapper(**args. *kw):
            args[0].__sync_semaphore.acquire()
            ret = func(*args, **kw)
            args[0].__sync_semaphore.release()
            return ret
       return wrapper

    @__semaphore
    def __setattr__(self, name, value):
        super().__setattr__(name, value)

    @__semaphore
    def save_to_disk(self):
        """ access to my_class.x and my_class.y """

my_class = myclass()
my_class.x['a'] = 123

With the code above, I'm trying to use semaphore to protect my x and y when ever save_to_disk is called. But when I call my_class.x['a'] = 123 , my_class.__setattr__ is not called. So that my x is not protected.

I have 2 question:

  • when I call my_class.x['a'] = 123 which python function is called?
  • how can I protect my x and y in my_class only, not global list and dict ; my x and y might also have a list or a dict inside of it.

Update: I want to update some concept for the random code above. I want to create a kernel-like AI. The AI must to do 2 work at the same time. One is collecting all information that I give it. Two is that it has to save the information to disk when the threshold is reach (i do not want it to kill my RAM)

What I'm tried to do

  • Creating a class that inherit dict and list , to override {} and [] , but it need me to update all {} and [] . That is not efficiency.
  • Currently I'm trying to create a read/write semaphore and then override dict().__setitem__ , list.append ,etc. But I do not know what will come

TLDR: It is not useful to do this just on myclass methods since not only myclass is involved. my_class.x['a'] = 123 is equivalent to this:

def set_x_a(obj: myclass, value):
    x = obj.__getattr__('x')   # fetch `x` via `myclass` method
    x.__setitem__('a', value)  # set `'a'` via `type(x)` method

set_x_a(my_class, 123)

Note how the call to my_class.__getattr__ has already completed when x.__setitem__ is called. Any synchronisation internal to my_class methods is thus of the wrong scope.


You can protect class fields from concurrent access by only giving access to them in synchronised blocks.

Python's basic means of synchronising blocks is the with statement, which for example can be used with threading locks . To simplify creating a custom block, contextlib.contextmanager work with a single generator (instead of two methods). Finally, aproperty allows to add behaviour to attributes , such as synchronisation.

import sys
import threading
from contextlib import contextmanager

class Synchronized:
    def __init__(self):
        self._x = {}  # actual data, stored internally
        self._mutex = threading.RLock()

    @property
    @contextmanager
    def x(self):           # public behaviour of data
        with self._mutex:  # only give access when synchronised
            yield self._x

    def save(self, file=sys.stdout):
        with self._mutex:  # only internally access when synchronised
            file.write(str(self._x))

The important change is that the dict attribute is no longer directly exposed. It is only available with holding a lock .

synced = Synchronized()
with synced.x as x:
    x['a'] = 123
    x['b'] = 42

synced.save()

You can extend this pattern to additional attributes, and improve the protection of attributes. For example, you can yield of a copy or collections.ChainMap of self._x , and explicitly update the internal state with this at the end of the block -- thus invalidating the effect of external references afterwards.

Q1

when I call my_class.x['a'] = 123 which python function is called?

call the def __getattribute__(self, item): first

about your idea

I want to create a kernel-like AI. The AI must to do 2 work at the same time. One is collecting all information that I give it. Two is that it has to save the information to disk when the threshold is reach (i do not want it to kill my RAM)

The problem is because two threads want to share the same variable, right?

If so, maybe you can try to get only one thread working at a time, then don't worry about the resources be changing.

for example:

import threading
import numpy as np
from time import time, sleep


def get_data(share_list, share_dict):
    num_of_data = 0
    while num_of_data < 6:
        t_s = time()
        if is_writing_flag.is_set():
            sleep(REFRESH_TIME)
            continue

        while 1:
            data = np.random.normal(1, 1, (10,))
            threshold = all(data > 1.6)
            if threshold:
                share_list.append(data)
                share_dict['time'] = time() - t_s
                num_of_data += 1
                is_writing_flag.set()
                break
    close_keeper_flag.clear()


def data_keeper(share_list, share_dict):
    while close_keeper_flag.is_set():
        while is_writing_flag.is_set():
            # save as csv, json, yaml...
            print(share_list.pop())
            print(share_dict['time'])
            is_writing_flag.clear()
        sleep(REFRESH_TIME)


def main():
    share_list = []
    share_dict = {}
    td_collect_data = threading.Thread(target=get_data, name='collect some data', args=[share_list, share_dict])
    td_data_keeper = threading.Thread(target=data_keeper, name='save data.', args=[share_list, share_dict])
    for th in (td_collect_data, td_data_keeper):
        th.start()


if __name__ == '__main__':
    REFRESH_TIME = 0.2
    is_writing_flag = threading.Event()
    is_writing_flag.clear()

    close_keeper_flag = threading.Event()
    close_keeper_flag.set()
    main()

But I will prefer using the asyncio to handle this, for example

import asyncio
import numpy as np
from time import time


async def take_data(num_of_data):
    count = 0
    t_s = time()
    while 1:
        if count == num_of_data:
            break
        data = await collect_data()
        cost_time = time() - t_s
        yield list(data), dict(time=cost_time)
        t_s = time()
        count += 1


async def collect_data():
    while 1:
        data = np.random.normal(1, 1, (10,))
        threshold = all(data > 1.6)
        if threshold:
            break
    return data


async def ai_process():
    async for res_list, res_dict in take_data(5):
        print(res_dict['time'])
        # save_to_desktop()
        ...


def main():
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait([ai_process()]))
    loop.close()


if __name__ == '__main__':
    main()

If this is still not useful to you at all, I will delete the answer. If you have any questions, please let me know, thank you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM