简体   繁体   中英

Difference between df.loc['col name'], df.loc[index]['col name'] and df.loc[index, 'col name'] in pandas?

I have a dataframe df with a column name 'Store'. If I want to retrieve the column, the following lines work equally well - df['Store'] or df[:]['Store'] or df[:,'Store'] .

What is the difference between the two? And should one be used over the other?

Thank you.

df.loc[index, 'col name'] is more idiomatic and preferred, especially if you want to filter rows

Demo: for 1.000.000 x 3 shape DF

In [26]: df = pd.DataFrame(np.random.rand(10**6,3), columns=list('abc'))

In [27]: %timeit df[df.a < 0.5]['a']
10 loops, best of 3: 45.8 ms per loop

In [28]: %timeit df.loc[df.a < 0.5]['a']
10 loops, best of 3: 45.8 ms per loop

In [29]: %timeit df.loc[df.a < 0.5, 'a']
10 loops, best of 3: 37 ms per loop

For construction where you need only one column and don't filter rows like df[:]['Store'] - it's better to use simply df['Store'] :

In [30]: %timeit df[:]['a']
1000 loops, best of 3: 436 µs per loop

In [31]: %timeit df.loc[:]['a']
10000 loops, best of 3: 25.9 µs per loop

In [36]: %timeit df['a'].loc[:]
10000 loops, best of 3: 26.5 µs per loop

In [32]: %timeit df.loc[:, 'a']
10000 loops, best of 3: 126 µs per loop

In [33]: %timeit df['a']
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.17 µs per loop

Unconditional access of multiple columns:

In [34]: %timeit df[['a','b']]
10 loops, best of 3: 22 ms per loop

In [35]: %timeit df.loc[:, ['a','b']]
10 loops, best of 3: 22.6 ms per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM