简体   繁体   中英

Select two sets of columns by column names in Pandas

Take the DataFrame in the answer of Loc vs. iloc vs. ix vs. at vs. iat? for example.

df = pd.DataFrame(
{'age':[30, 2, 12, 4, 32, 33, 69],
 'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
 'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
 'height':[165, 70, 120, 80, 180, 172, 150],
 'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
 'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']},
 index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']
)

Now I want all columns except 'food' and 'height' .

I thought something like df.loc[:,['age':'color', 'score':'state']] would work, but Python returns SyntaxError: invalid syntax .

I am aware of that there is one way to work around: df.drop(columns = ['food', 'height']) . However, in my real life situation, I have hundreds of columns to be dropped. Typing out all column names is so inefficient.

I am expecting something similar with dplyr::select(df, -(food:height)) or dplyr::select(df, age:color, score:state) in R language.

Also have read Selecting/Excluding sets of columns in Pandas .

First, find all columns lying between food and height (inclusive).

c = df.iloc[-1:0].loc[:, 'food':'height'].columns

Next, filter with difference / isin / setdiff1d -

df[df.columns.difference(c)]

Or,

df.loc[:, ~df.columns.isin(c)]

Or,

df[np.setdiff1d(df.columns, c)]

           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

First get positions of columns names by Index.get_loc and then use numpy.r_ for join all slicers together:

a = np.r_[df.columns.get_loc('age'):df.columns.get_loc('color')+1, 
          df.columns.get_loc('score'):df.columns.get_loc('state')+1]

df = df.iloc[:, a]
print (df)
           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

One option for flexible column selection is with select_columns from pyjanitor :

# pip install pyjanitor
import pandas as pd
import janitor

df.select_columns(slice('age', 'color'), slice('score', 'state'))

           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

df.select_columns(slice('food', 'height'), invert = True)

           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM