简体   繁体   中英

Removing rows from a dataframe by identifying rows to keep

I'm interested in keeping a handful of people who are males from a dataset. How do I let Python know I want to keep X, Y, and Z males and to drop the rest of the males? For example, say I start with this dataframe:

import pandas as pd
df1 = pd.DataFrame({'Salary':[8700,6300,4700,2100,3400], 'Gender':['Male','Female','Male','Female','Male']},index=pd.Series(['Joe Smith', 'Jane Doe', 'Rob Dole', 'Sue Pam', 'Jack Li'], name='Name'))

print df1

           Gender  Salary
Name                     
Joe Smith    Male    8700
Jane Doe   Female    6300
Rob Dole     Male    4700
Sue Pam    Female    2100
Jack Li      Male    3400

Of the males in the dataframe, I want to keep Joe Smith and Rob Dole and remove all of the other males. What's the fastest way to do this across thousands of names with gender identifiers? I have a list of about 20-25 names that I would like to keep among the thousands. My final dataframe should look like this:

           Gender  Salary
Name                     
Joe Smith    Male    8700
Jane Doe   Female    6300
Rob Dole     Male    4700
Sue Pam    Female    2100

your condition is :

cond=(df1.Gender=='Female') | (df1.index.isin(['Joe Smith','Rob Dole']))

and your will simply df1[cond] .

Alternatively you can use .query() method:

In [14]: df1.query("Gender in ['Female','Unknown'] or Name in ['Joe Smith','Rob Dole']")
Out[14]:
           Gender  Salary
Name
Joe Smith    Male    8700
Jane Doe   Female    6300
Rob Dole     Male    4700
Sue Pam    Female    2100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM