简体   繁体   English

访问 pandas 列的最快方法

[英]Fastest way to access pandas column

I am confused by the difference in performance between the various ways to access a pandas column.我对访问 Pandas 列的各种方法之间的性能差异感到困惑。

In [1]: df = pd.DataFrame([[1,1,1],[2,2,2]],columns=['a','b','c'])

In [2]: %timeit df['a']
The slowest run took 75.37 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 3.12 µs per loop

In [3]: %timeit df.a
The slowest run took 5.14 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 6.59 µs per loop

In [4]: %timeit df.loc[:,'a']
10000 loops, best of 3: 55 µs per loop

I understand that the last variant is slower because it enables the values to be set, not just accessed.我知道最后一个变体速度较慢,因为它可以设置值,而不仅仅是访问值。 But why is df.a slower than df['a'] ?但是为什么df.adf['a']慢? This seems true regardless of the intermediate results being cached.无论中间结果被缓存如何,这似乎都是正确的。

Here is a link that explains what is a difference between a .是一个链接,解释了. access and [] access.访问和[]访问。

Also look into the behavior of these operators in the documentation还要查看文档中这些运算符的行为

getitem (for [] ) and getattr (for . ) methods. getitem (对于[] )和getattr (对于. )方法。

. seems to access the column through a function call, thereby taking less time than a [] which is accessed as a dictionary key-value似乎通过函数调用访问列,因此比作为字典键值访问的[]花费的时间更少

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM