简体   繁体   中英

Selecting multiple columns from pandas DataFrame by labels and integers in single indexing

I want to select columns 'd' and 'f', and the first two columns, whatever their name is, in that order, from my pandas DataFrame.

In [7]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 
   ...: 'c': [7, 8, 9], 'd': [10, 11, 12], 
   ...: 'e': [13, 14, 15], 'f': [16, 17, 18]})                                                                                                                                                                                                                                    

In [8]: df                                                                                                                                                                                                                                                                        
Out[8]: 
   a  b  c   d   e   f
0  1  4  7  10  13  16
1  2  5  8  11  14  17
2  3  6  9  12  15  18

In [9]: df[['d', 'f'] + list(df.columns[:2])]                                                                                                                                                                                                                                     
Out[9]: 
    d   f  a  b
0  10  16  1  4
1  11  17  2  5
2  12  18  3  6

Is there a better way? That is: more concise, elegant, or performant.

Wouldn't say this is more elegant than what you're already doing, but here are some equally concise versions of your selection:

df[['d', 'f'] + [*df][:2]]  # python >= 3.5 only

    d   f  a  b
0  10  16  1  4
1  11  17  2  5
2  12  18  3  6

This uses label based slicing. the [*df] term unpacks df 's columns into a list, then uses that to slice according to the given range. If you have multiple independent ranges to slice, either save the output of [*df] in a variable for reuse, or see below.

Performance is also hard to pin down, since these are all operations on lists/headers (what we call DataFrame metadata) instead of the actual data. So if there are any performance bottlenecks in your code, it is not here.


You can convert your labels into int positional indexes and index using np.r_ and df.iloc :

l = df.columns.get_loc
df.iloc[:, np.r_[l('d'), l('f'), :2]]

    d   f  a  b
0  10  16  1  4
1  11  17  2  5
2  12  18  3  6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM