简体   繁体   中英

Drop rows in pandas dataframe based on fraction of total

country state       year    area
usa     iowa        2000    30
usa     iowa        2001    30
usa     iowa        2002    30
usa     iowa        2003    30
usa     kansas      2000    500
usa     kansas      2001    500
usa     kansas      2002    500
usa     kansas      2003    500
usa   washington    2000    245
usa   washington    2001    245
usa   washington    2002    245
usa   washington    2003    245

In the dataframe above, I want to drop the rows where the % of total area < 10%. In this case that would be all rows with state as iowa. What is the best way to do it in pandas? I tried groupby but not sure how to proceed.

df.groupby('area').sum()

Another solution with drop_duplicates and double boolean indexing :

a = df.drop_duplicates(['state','area'])
print (a)
  country       state  year  area
0     usa        iowa  2000    30
4     usa      kansas  2000   500
8     usa  washington  2000   245

states = a.loc[a.area.div(a.area.sum()) >.1, 'state']
print (states)
4        kansas
8    washington
Name: state, dtype: object

print (df[df.state.isin(states)])
   country       state  year  area
4      usa      kansas  2000   500
5      usa      kansas  2001   500
6      usa      kansas  2002   500
7      usa      kansas  2003   500
8      usa  washington  2000   245
9      usa  washington  2001   245
10     usa  washington  2002   245
11     usa  washington  2003   245

You want to take any of the area values within each state and sum them up. I take the first.

  • groupby('state').area.first().sum() is the thing we normalize by.

df[df.area.div(df.groupby('state').area.first().sum()) >= .1]

   country       state  year  area
4      usa      kansas  2000   500
5      usa      kansas  2001   500
6      usa      kansas  2002   500
7      usa      kansas  2003   500
8      usa  washington  2000   245
9      usa  washington  2001   245
10     usa  washington  2002   245
11     usa  washington  2003   245

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM