访问 pandas 列的最快方法

Question

I am confused by the difference in performance between the various ways to access a pandas column.我对访问 Pandas 列的各种方法之间的性能差异感到困惑。

In [1]: df = pd.DataFrame([[1,1,1],[2,2,2]],columns=['a','b','c'])

In [2]: %timeit df['a']
The slowest run took 75.37 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 3.12 µs per loop

In [3]: %timeit df.a
The slowest run took 5.14 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 6.59 µs per loop

In [4]: %timeit df.loc[:,'a']
10000 loops, best of 3: 55 µs per loop

I understand that the last variant is slower because it enables the values to be set, not just accessed.我知道最后一个变体速度较慢，因为它可以设置值，而不仅仅是访问值。 But why is df.a slower than df['a'] ?但是为什么df.a比df['a']慢？ This seems true regardless of the intermediate results being cached.无论中间结果被缓存如何，这似乎都是正确的。

Answer 1

Here is a link that explains what is a difference between a .这是一个链接，解释了. access and [] access.访问和[]访问。

Also look into the behavior of these operators in the documentation还要查看文档中这些运算符的行为

getitem (for [] ) and getattr (for . ) methods. getitem （对于[] ）和getattr （对于. ）方法。

. seems to access the column through a function call, thereby taking less time than a [] which is accessed as a dictionary key-value似乎通过函数调用访问列，因此比作为字典键值访问的[]花费的时间更少

访问 pandas 列的最快方法

问题描述

1 个解决方案

解决方案1
2 2017-07-10 06:15:15

访问 pandas 列的最快方法

问题描述

1 个解决方案

解决方案1 2 2017-07-10 06:15:15

解决方案1
2 2017-07-10 06:15:15