简体   繁体   中英

How to iterate over columns and check condition by group

I have data for many countries over a period of time (2001-2003). It looks something like this:

index year country inflation GDP
1 2001 AFG nan 48
2 2002 AFG nan 49
3 2003 AFG nan 50
4 2001 CHI 3.0 nan
5 2002 CHI 5.0 nan
6 2003 CHI 7.0 nan
7 2001 USA nan 220
8 2002 USA 4.0 250
9 2003 USA 2.5 280

I want to drop countries in case there is no data (ie values are missing for all years) for any given variable.

In the example table above, I want to drop AFG (because it misses all values for inflation) and CHI (GDP missing). I don't want to drop observation #7 just because one year is missing.

What's the best way to do that?

This should work by filtering all values that have nan in one of (inflation, GDP):

(
    df.groupby(['country'])
    .filter(lambda x: not x['inflation'].isnull().all() and not x['GDP'].isnull().all())
)

Note, if you have more than two columns you can work on a more general version of this:

df.groupby(['country']).filter(lambda x: not x.isnull().all().any())

You can also try this:

# check where the sum is equal to 0 - means no values in the column for a specific country
group_by = df.groupby(['country']).agg({'inflation':sum, 'GDP':sum}).reset_index()

# extract only countries with information on both columns
indexes = group_by[ (group_by['GDP'] != 0) & ( group_by['inflation'] != 0) ].index
final_countries = list(group_by.loc[ group_by.index.isin(indexes), : ]['country'])

# keep the rows contains the countries

df = df.drop(df[~df.country.isin(final_countries)].index)

You could reshape the data frame from long to wide, drop nulls, and then convert back to wide.

To convert from long to wide, you can use pivot functions . See this question too .

Here's code for dropping nulls, after its reshaped:

df.dropna(axis=0, how= 'any', thresh=None, subset=None, inplace=True) # Delete rows, where any value is null

To convert back to long, you can use pd.melt.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM