如何结合@singledispatch 和@lru_cache？

Question

I have a Python single-dispatch generic function like this:我有一个像这样的 Python 单分派通用函数：

@singledispatch
def cluster(documents, n_clusters=8, min_docs=None, depth=2):
  ...

It is overloaded like this:它是这样重载的：

@cluster.register(QuerySet)
@lru_cache(maxsize=512)
def _(documents, *args, **kwargs):
  ...

The second one basically preprocesses a QuerySet object and calls the generic cluster() function.第二个基本上预处理QuerySet对象并调用通用cluster()函数。 A QuerySet is a Django object , but that should not play a role here; QuerySet 是一个 Django 对象，但它不应该在这里起作用； apart from the fact that it is hashable and thus usable with lru_cache .除了它是可散列的，因此可以与lru_cache一起lru_cache 。

The generic function cannot be cached though because it accepts unhashable objects such as lists as arguments.通用函数不能被缓存，因为它接受不可散列的对象，例如列表作为参数。 However, the overloading function can be cached because a QuerySet object is hashable.但是，可以缓存重载函数，因为QuerySet对象是可散列的。 That is why I've added the @lru_cache() annotation.这就是我添加@lru_cache()注释的原因。

However, caching does not seem to be applied:但是，似乎没有应用缓存：

qs: QuerySet = [...]

start = datetime.now(); cluster(Document.objects.all()); print(datetime.now() - start)               
0:00:02.629259

I would expect the same call to take place in an instance, but:我希望在一个实例中发生相同的调用，但是：

start = datetime.now(); cluster(Document.objects.all()); print(datetime.now() - start)               
0:00:02.468675

This is confirmed by the cache statistics:缓存统计数据证实了这一点：

cluster.registry[django.db.models.query.QuerySet].cache_info()
CacheInfo(hits=0, misses=2, maxsize=512, currsize=2)

Changing the order of the @lru_cache and the @.register annotations does not seem to make a difference.更改@lru_cache和@.register注释的顺序似乎没有什么区别。

This question is similar, but the answer does not fit on the individual function level. 这个问题很相似，但答案不适合个人功能级别。

Is it even possible to combine these two annotations on this level?甚至可以在这个级别上组合这两个注释吗？ If so, how?如果是这样，如何？

Answer 1

hash(Document.objects.all()) == hash(Document.objects.all()) is not consistent for Django QuerySet . hash(Document.objects.all()) == hash(Document.objects.all())与 Django QuerySet不一致。

The call Document.objects.all() doesn't hit the database until the QuerySet returned is evaluated.在评估返回的QuerySet之前，调用Document.objects.all()不会访问数据库。

Pickling is usually used as a precursor to caching酸洗通常用作缓存的前兆

Django docs . Django 文档。

Depending on your use case you can try caching the pickle of the QuerySet or its query attribute.根据您的用例，您可以尝试缓存QuerySet或其query属性的pickle。

@cluster.register(bytes)
@lru_cache(maxsize=512)
def _(documents, *args, **kwargs):
    documents = pickle.loads(documents)
    ...

cluster(pickle.dumps(Document.objects.all()))

or或者

cluster(pickle.dumps(Document.objects.all().query))

如何结合@singledispatch 和@lru_cache？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-03-05 12:27:23

如何结合@singledispatch 和@lru_cache？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-03-05 12:27:23

解决方案1
0 已采纳 2020-03-05 12:27:23