Drop rows in pandas dataframe based on fraction of total

Question

country state       year    area
usa     iowa        2000    30
usa     iowa        2001    30
usa     iowa        2002    30
usa     iowa        2003    30
usa     kansas      2000    500
usa     kansas      2001    500
usa     kansas      2002    500
usa     kansas      2003    500
usa   washington    2000    245
usa   washington    2001    245
usa   washington    2002    245
usa   washington    2003    245

In the dataframe above, I want to drop the rows where the % of total area < 10%. In this case that would be all rows with state as iowa. What is the best way to do it in pandas? I tried groupby but not sure how to proceed.

df.groupby('area').sum()

Answer 1

Another solution with drop_duplicates and double boolean indexing :

a = df.drop_duplicates(['state','area'])
print (a)
  country       state  year  area
0     usa        iowa  2000    30
4     usa      kansas  2000   500
8     usa  washington  2000   245

states = a.loc[a.area.div(a.area.sum()) >.1, 'state']
print (states)
4        kansas
8    washington
Name: state, dtype: object

print (df[df.state.isin(states)])
   country       state  year  area
4      usa      kansas  2000   500
5      usa      kansas  2001   500
6      usa      kansas  2002   500
7      usa      kansas  2003   500
8      usa  washington  2000   245
9      usa  washington  2001   245
10     usa  washington  2002   245
11     usa  washington  2003   245

Answer 2

You want to take any of the area values within each state and sum them up. I take the first.

groupby('state').area.first().sum() is the thing we normalize by.

df[df.area.div(df.groupby('state').area.first().sum()) >= .1]

   country       state  year  area
4      usa      kansas  2000   500
5      usa      kansas  2001   500
6      usa      kansas  2002   500
7      usa      kansas  2003   500
8      usa  washington  2000   245
9      usa  washington  2001   245
10     usa  washington  2002   245
11     usa  washington  2003   245

Drop rows in pandas dataframe based on fraction of total

Question

2 answers

solution1
2 2017-01-02 10:05:40

solution2
1 ACCPTED 2017-01-02 09:49:03

Drop rows in pandas dataframe based on fraction of total

Question

2 answers

solution1 2 2017-01-02 10:05:40

solution2 1 ACCPTED 2017-01-02 09:49:03

solution1
2 2017-01-02 10:05:40

solution2
1 ACCPTED 2017-01-02 09:49:03