[英]Expensive operation done once in a function that is called many times, Python 3
I have a long list of groups in json and I want a little utility: 我在json中有很长的组列表,我想要一个小实用程序:
def verify_group(group_id):
group_ids = set()
for grp in groups:
group_ids.add(grp.get("pk"))
return group_id in group_ids
The obvious approach is to load the set outside the function, or otherwise declare a global -- but let's assume I don't want a global variable. 一种明显的方法是将集合加载到函数外部,或者以其他方式声明全局变量-但让我们假设我不需要全局变量。
In statically typed languages I can say that the set is static and, I believe that will accomplish my aim. 在静态类型语言中,我可以说该集合是静态的,并且我相信这将实现我的目标。 How would one do something similar in python? 在python中如何做类似的事情? That is : the first call initializes the set, group_ids , subsequent calls use the set initialized in the first call. 即:第一个调用初始化集合group_ids ,后续调用使用在第一个调用中初始化的集合。
BTW, when I use the profilestats package to profile this little code snippet, I see these frightening results: 顺便说一句,当我使用profilestats包来分析这个小代码片段时,我看到了这些令人恐惧的结果:
ncalls tottime percall cumtime percall filename:lineno(function)
833 0.613 0.001 1.059 0.001 verify_users_groups.py:25(verify_group)
2558976 0.253 0.000 0.253 0.000 {method 'get' of 'dict' objects}
2558976 0.193 0.000 0.193 0.000 {method 'add' of 'set' objects}
I tried adding functools.lru_cache -- but that type of optimization doesn't address my present question -- how can I load the set group_ids once inside a def block? 我尝试添加functools.lru_cache-但是这种类型的优化不能解决我目前的问题-如何在 def块中一次加载set group_ids ?
Thank you for your time. 感谢您的时间。
There isn't an equivalent of static
, however you can achieve the same effect in different ways: 没有等效的static
,但是您可以通过不同的方式实现相同的效果:
One way is to abuse the infamous mutable default argument: 一种方法是滥用臭名昭著的可变默认参数:
def verify_group(group_id, group_ids=set()):
if not group_ids:
group_ids.update(grp.get("pk") for grp in groups)
return group_id in group_ids
This however allows the caller to change that value (which may be a feature or a bug for you). 但是,这允许调用者更改该值(这可能是您的功能或错误)。
I usually prefer using a closure: 我通常更喜欢使用闭包:
def make_group_verifier():
group_ids = {grp.get("pk") for grp in groups}
def verify_group(group_id):
# nonlocal group_ids # if you need to change its value
return group_id in group_ids
return verify_group
verify_group = make_group_verifier()
Python is an OOP language. Python是一种OOP语言。 What you describe is an instance method. 您描述的是一个实例方法。 Initialize the class with the set and call the method on the instance. 用集合初始化类,然后在实例上调用方法。
class GroupVerifier:
def __init__(self):
self.group_ids = {grp.get("pk") for grp in groups}
def verify(self, group_id):
# could be __call__
return group_id in self.group_ids
I'd also like to add that it depends by the API design. 我还想补充一点,它取决于API设计。 You could let the take the responsibility of pre-computing and providing the value if they want performance. 如果他们需要性能,您可以让他们负责预先计算并提供价值。 This is the choice taken by, for example, random.choices
. 这是例如random.choices
选择的random.choices
。 The cum_weights
parameter isn't necessary but it allows the user to remove the cost of computing that array for every call in performance critical code. cum_weights
参数不是必需的,但它允许用户消除性能关键代码中每次调用时计算该数组的开销。 So instead of having a mutable argument you use None
as default and compute that set only if the value passed is None
otherwise you assume the caller did the work for you. 因此,不要使用可变参数,而应将None
用作默认值,并仅在传递的值为None
时才计算该设置,否则您将假定调用者为您完成了工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.