简体   繁体   中英

pandas groupby sum if count equals condition

I'm downloading data from FRED. I'm summing to get annual numbers, but don't want incomplete years. So I need a sum condition if count the number of obs is 12 because the series is monthly.

import pandas_datareader.data as web

mnemonic = 'RSFSXMV'

df = web.DataReader(mnemonic, 'fred', 2000, 2020)
df['year'] = df.index.year
new_df = df.groupby(["year"])[mnemonic].sum().reset_index()
print(new_df)

I don't want 2019 to show up.

In your case we using transform with nunique to make sure each year should have 12 unique month , if not we drop it before do the groupby sum

df['Month']=df.index.month
m=df.groupby('year').Month.transform('nunique')==12
new_df = df.loc[m].groupby(["year"])[mnemonic].sum().reset_index()

isin

df['Month']=df.index.month
m=df.groupby('year').Month.nunique()
new_df = df.loc[df.year.isin(m.index[m==12)].groupby(["year"])[mnemonic].sum().reset_index()

You could use a aggreate function count while groupby :

df['year'] = df.index.year
df = df.groupby('year').agg({'RSFSXMV': 'sum', 'year': 'count'})

which will give you:

        RSFSXMV  year
year
2000    2487790   12
2001    2563218   12
2002    2641870   12
2003    2770397   12
2004    2969282   12
2005    3196141   12
2006    3397323   12
2007    3531906   12
2008    3601512   12
2009    3393753   12
2010    3541327   12
2011    3784014   12
2012    3934506   12
2013    4043037   12
2014    4191342   12
2015    4252113   12
2016    4357528   12
2017    4561833   12
2018    4810502   12
2019    2042147   5

Then simply drop those rows with a year count less than 12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM