I'm interested in keeping a handful of people who are males from a dataset. How do I let Python know I want to keep X, Y, and Z males and to drop the rest of the males? For example, say I start with this dataframe:
import pandas as pd
df1 = pd.DataFrame({'Salary':[8700,6300,4700,2100,3400], 'Gender':['Male','Female','Male','Female','Male']},index=pd.Series(['Joe Smith', 'Jane Doe', 'Rob Dole', 'Sue Pam', 'Jack Li'], name='Name'))
print df1
Gender Salary
Name
Joe Smith Male 8700
Jane Doe Female 6300
Rob Dole Male 4700
Sue Pam Female 2100
Jack Li Male 3400
Of the males in the dataframe, I want to keep Joe Smith and Rob Dole and remove all of the other males. What's the fastest way to do this across thousands of names with gender identifiers? I have a list of about 20-25 names that I would like to keep among the thousands. My final dataframe should look like this:
Gender Salary
Name
Joe Smith Male 8700
Jane Doe Female 6300
Rob Dole Male 4700
Sue Pam Female 2100
your condition is :
cond=(df1.Gender=='Female') | (df1.index.isin(['Joe Smith','Rob Dole']))
and your will simply df1[cond]
.
Alternatively you can use .query() method:
In [14]: df1.query("Gender in ['Female','Unknown'] or Name in ['Joe Smith','Rob Dole']")
Out[14]:
Gender Salary
Name
Joe Smith Male 8700
Jane Doe Female 6300
Rob Dole Male 4700
Sue Pam Female 2100
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.