[英]How write an efficient overload function of __dict__?
I want to implement a to_dict
function that behaves similarly to the built-in __dict__
attribute but allows me to have custom logic. 我想实现一个to_dict
函数,其行为类似于内置的__dict__
属性,但允许我具有自定义逻辑。 (It is used for construct a pandas DataFrame. See the example below. ) (它用于构造pandas DataFrame。请参见下面的示例。)
However I find out that my to_dict
function is ~25% slower than __dict__
even when they do exactly the same thing. 但是我发现我的to_dict
函数比__dict__
慢25%,即使它们做的完全一样。 How can I improve my code? 如何改善我的代码?
class Foo:
def __init__(self, a,b,c,d):
self.a = a
self.b = b
self.c = c
self.d = d
def to_dict(self):
return {
'a':self.a,
'b':self.b,
'c':self.c,
'd':self.d,
}
list_test = [Foo(i,i,i,i)for i in range(100000)]
%%timeit
pd.DataFrame(t.to_dict() for t in list_test)
# Output: 199 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.DataFrame(t.__dict__ for t in list_test)
# Output: 156 ms ± 948 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
A digress to this question but related to my final goal: what is the most efficient way to construct a pandas DataFrame from a list of custom objects? 偏离这个问题但与我的最终目标有关:从自定义对象列表构造熊猫DataFrame的最有效方法是什么? My current approach is taken from https://stackoverflow.com/a/54975755/1087924 我当前的方法来自https://stackoverflow.com/a/54975755/1087924
__dict__
does not “convert” an object to a dict
(unlike __int__
, __str__
, etc), it's where the object's (writable) attributes are stored. __dict__
不会将对象“转换”为dict
(与__int__
, __str__
等不同),它是存储对象(可写)属性的位置。
I think your implementation is reasonably efficient. 我认为您的实施相当有效。 Consider this simplified example: 考虑以下简化示例:
import dis
class Foo:
def __init__(self, a):
self.a = a
def to_dict(self):
return {'a': self.a}
foo = Foo(1)
dis.dis(foo.to_dict)
dis.dis('foo.__dict__')
We can see that Python looks up the attributes and creates a new dict
every time (plus you'd need to make a call to .to_dict
, not shown here): 我们可以看到Python每次都在查询属性并创建一个新的dict
(此外,您还需要调用.to_dict
,此处未显示):
7 0 LOAD_CONST 1 ('a')
2 LOAD_FAST 0 (self)
4 LOAD_ATTR 0 (a)
6 BUILD_MAP 1
8 RETURN_VALUE
while accessing an existing attribute is much simpler: 访问现有属性要简单得多:
1 0 LOAD_NAME 0 (foo)
2 LOAD_ATTR 1 (__dict__)
4 RETURN_VALUE
You could however store your custom representation on the instance, achieving the same exact bytecode as with __dict__
, but then you need to update it correctly on all changes to Foo
(which will cost some speed and memory). 但是,您可以将自定义表示形式存储在实例上,实现与__dict__
相同的字节码,但是随后您需要在对Foo
进行的所有更改中正确更新它(这将花费一些速度和内存)。 If updates are uncommon in your use-case, this could be an acceptable trade-off. 如果更新在您的用例中不常见,则这是可以接受的折衷方案。
In your example, a simple option is to override __getattribute__
, but I'm guessing Foo
has other attributes, so having setters is probably going to be more convenient: 在您的示例中,一个简单的选项是覆盖__getattribute__
,但是我猜Foo
具有其他属性,因此使用setter可能会更方便:
class Foo:
def __init__(self, a):
self.dict = {}
self.a = a
@property
def a(self):
return self._a
@a.setter
def a(self, value):
self._a = value
self.dict['a'] = value
foo = Foo(1)
print(foo.dict) # {'a': 1}
foo.a = 10
print(foo.dict) # {'a': 10}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.