简体   繁体   English

如何在Python中缓存/存储对象?

[英]How to cache/memoize objects in Python?

I have some objects that are very slow to instantiate. 我有一些实例化非常慢的对象。 They are representation of data loaded from external sources such as YAML files, and loading large YAML files is slow (I don't know why). 它们代表从外部来源(例如YAML文件)加载的数据,而加载大型YAML文件的速度很慢(我不知道为什么)。

I know these objects depends on some external factors: 我知道这些对象取决于一些外部因素:

  • The arguments passed at the object creation 创建对象时传递的参数
  • Environment variables 环境变量
  • Some external files 一些外部文件

Ideally I would like a transparent non boilerplate method to cache these objects if the external factors are the same: 理想情况下,如果外部因素相同,我希望使用透明的非样板方法来缓存这些对象:

@cache(depfiles=('foo',), depvars=(os.environ['FOO'],))
class Foo():
    def __init__(*args, **kwargs):
        with open('foo') as fd:
           self.foo = fd.read()
        self.FOO = os.environ['FOO']
        self.args = args
        self.kwargs = kwargs

The main idea is that the first time I instantiate Foo , a cache file is created with the content of the object, then the next time I instantiate it (in another Python session), the cache file will be used only if none of the dependencies and argument have changed. 主要思想是,第一次实例化Foo ,将使用对象的内容创建一个缓存文件,然后在下次实例化(在另一个Python会话中)时,仅当没有依赖项时才使用该缓存文件。和论点已经改变。

The solution I've found so far is based on shelve : 到目前为止,我发现的解决方案是基于shelve

import shelve

class Foo(object):
    _cached = False
    def __new__(cls, *args, **kwargs):
        cache = shelve.open('cache')
        cache_foo = cache.get(cls.__name__)
        if isinstance(cache_foo, Foo):
            cache_foo._cached = True
            return cache_foo
        self = super(Foo, cls).__new__(cls, *args, **kwargs)
        return self

    def __init__(self, *args, **kwargs):
        if self._cached:
            return

        time.sleep(2) # Lots of work
        self.answer = 42

        cache = shelve.open('cache')
        cache[self.__class__.__name__] = self
        cache.sync() 

It works perfectly as is but it is too boilerplate and it doesn't cover all the cases: 它可以按原样完美运行,但是它太样板了,无法涵盖所有​​情况:

  • Conflicts when different classes have the same name 当不同的类具有相同的名称时发生冲突
  • Check for args and kwargs 检查args和kwargs
  • Check for dependencies (environment vars, external files) 检查依赖项(环境变量,外部文件)

Is there any native solution to achieve similar behavior in Python? 是否有任何本机解决方案可在Python中实现类似的行为?

Python 3 provides the functools.lru_cache() decorator to provide memoization of callables, but I think you're asking to preserve the caching across multiple runs of your application and by that point there is such a variety of differing requirements that you're unlikely to find a 'one size fits all' solution. Python 3提供了functools.lru_cache()装饰器来提供可调用对象的备注,但是我认为您是在要求保留应用程序多次运行的缓存,并且到那时为止,存在各种各样的不同要求,因此您不太可能找到一个“适合所有人”的解决方案。

If your own answer works for you then use it. 如果您自己的答案对您有用,请使用它。 So far as 'too much boilerplate' is concerned I would extract the caching out into a separate mixin class: the first reference to Foo in __new__ probably ought to be cls in any case and you can use the __qualname__ attribute instead of cls.__name__ to reduce the likelihood of class name conflicts (assuming Python 3.3 or later). 就“太多样板”而言,我会将缓存提取到一个单独的mixin类中:在任何情况下, __new__Foo的第一个引用都应该是cls ,并且可以使用__qualname__属性代替cls.__name__减少类名冲突的可能性(假设Python 3.3或更高版本)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM