Selecting multiple columns from pandas DataFrame by labels and integers in single indexing

Question

I want to select columns 'd' and 'f', and the first two columns, whatever their name is, in that order, from my pandas DataFrame.

In [7]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 
   ...: 'c': [7, 8, 9], 'd': [10, 11, 12], 
   ...: 'e': [13, 14, 15], 'f': [16, 17, 18]})                                                                                                                                                                                                                                    

In [8]: df                                                                                                                                                                                                                                                                        
Out[8]: 
   a  b  c   d   e   f
0  1  4  7  10  13  16
1  2  5  8  11  14  17
2  3  6  9  12  15  18

In [9]: df[['d', 'f'] + list(df.columns[:2])]                                                                                                                                                                                                                                     
Out[9]: 
    d   f  a  b
0  10  16  1  4
1  11  17  2  5
2  12  18  3  6

Is there a better way? That is: more concise, elegant, or performant.

Answer 1

Wouldn't say this is more elegant than what you're already doing, but here are some equally concise versions of your selection:

df[['d', 'f'] + [*df][:2]]  # python >= 3.5 only

    d   f  a  b
0  10  16  1  4
1  11  17  2  5
2  12  18  3  6

This uses label based slicing. the [*df] term unpacks df 's columns into a list, then uses that to slice according to the given range. If you have multiple independent ranges to slice, either save the output of [*df] in a variable for reuse, or see below.

Performance is also hard to pin down, since these are all operations on lists/headers (what we call DataFrame metadata) instead of the actual data. So if there are any performance bottlenecks in your code, it is not here.

You can convert your labels into int positional indexes and index using np.r_ and df.iloc :

l = df.columns.get_loc
df.iloc[:, np.r_[l('d'), l('f'), :2]]

    d   f  a  b
0  10  16  1  4
1  11  17  2  5
2  12  18  3  6

Selecting multiple columns from pandas DataFrame by labels and integers in single indexing

Question

1 answers

solution1
2 ACCPTED 2020-12-27 23:33:54

Selecting multiple columns from pandas DataFrame by labels and integers in single indexing

Question

1 answers

solution1 2 ACCPTED 2020-12-27 23:33:54

solution1
2 ACCPTED 2020-12-27 23:33:54