I am applying a pct_change calculation to a pandas dataframe. Everything works fine when the month column is ordered. When it is not the calculation comes out incorrect.
Here is my code now:
data = [
('product_a','1/31/2014',53)
,('product_b','1/31/2014',44)
,('product_c','1/31/2014',36)
,('product_a','11/30/2013',52)
,('product_b','11/30/2013',43)
,('product_c','11/30/2013',35)
,('product_a','3/31/2014',50)
,('product_b','3/31/2014',41)
,('product_c','3/31/2014',34)
,('product_a','12/31/2013',50)
,('product_b','12/31/2013',41)
,('product_c','12/31/2013',34)
,('product_a','2/28/2014',52)
,('product_b','2/28/2014',43)
,('product_c','2/28/2014',35)
]
product_df = DataFrame( data, columns=['prod_desc','activity_month','prod_count'] )
for index, row in product_df.iterrows():
row['activity_month']= datetime.strptime(row['activity_month'],'%m/%d/%Y')
product_df.loc[index, 'activity_month'] = date.strftime(row['activity_month'],'%Y-%m-%d')
product_df['pct_ch'] = product_df.groupby('prod_desc')['prod_count'].pct_change()
product_df = product_df.sort(['prod_desc','activity_month'])
What I get returned:
prod_desc activity_month prod_count pct_ch
3 product_a 2013-11-30 52 -0.018868
9 product_a 2013-12-31 50 0.000000
0 product_a 2014-01-31 53 NaN
12 product_a 2014-02-28 52 0.040000
6 product_a 2014-03-31 50 -0.038462
4 product_b 2013-11-30 43 -0.022727
10 product_b 2013-12-31 41 0.000000
1 product_b 2014-01-31 44 NaN
13 product_b 2014-02-28 43 0.048780
7 product_b 2014-03-31 41 -0.046512
5 product_c 2013-11-30 35 -0.027778
11 product_c 2013-12-31 34 0.000000
2 product_c 2014-01-31 36 NaN
14 product_c 2014-02-28 35 0.029412
8 product_c 2014-03-31 34 -0.028571
The calculations here are out of order as the pct_change for the first month of each product should be NaN.
I think the issue is with the pct_change calculation not including 'activity_month' in the groupby. When I try to add it I get the following outputs.
product_df['pct_ch'] = product_df.groupby(['prod_desc','activity_month'])['prod_count'].pct_change()
prod_desc activity_month prod_count pct_ch
3 product_a 2013-11-30 52 NaN
9 product_a 2013-12-31 50 NaN
0 product_a 2014-01-31 53 NaN
12 product_a 2014-02-28 52 NaN
6 product_a 2014-03-31 50 NaN
4 product_b 2013-11-30 43 NaN
10 product_b 2013-12-31 41 NaN
1 product_b 2014-01-31 44 NaN
13 product_b 2014-02-28 43 NaN
7 product_b 2014-03-31 41 NaN
5 product_c 2013-11-30 35 NaN
11 product_c 2013-12-31 34 NaN
2 product_c 2014-01-31 36 NaN
14 product_c 2014-02-28 35 NaN
8 product_c 2014-03-31 34 NaN
So i think the issue you have is that the groupby is calculating the percentage difference between adjacent rows of identical prod_desc
and this isn't ordered in date order when you perform the operation so moving the sort above the groupby will fix that issue. You can also remove the for loop and write that as one line using pandas.
import pandas as pd
data = [
('product_a','1/31/2014',53)
,('product_b','1/31/2014',44)
,('product_c','1/31/2014',36)
,('product_a','11/30/2013',52)
,('product_b','11/30/2013',43)
,('product_c','11/30/2013',35)
,('product_a','3/31/2014',50)
,('product_b','3/31/2014',41)
,('product_c','3/31/2014',34)
,('product_a','12/31/2013',50)
,('product_b','12/31/2013',41)
,('product_c','12/31/2013',34)
,('product_a','2/28/2014',52)
,('product_b','2/28/2014',43)
,('product_c','2/28/2014',35)
]
product_df = pd.DataFrame( data, columns=['prod_desc','activity_month','prod_count'])
product_df['activity_month'] = pd.to_datetime(product_df['activity_month'],
format='%m/%d/%Y')
product_df = product_df.sort_values(['prod_desc','activity_month'])
product_df['pct_ch'] = product_df.groupby('prod_desc')['prod_count'].pct_change()
I think this should produce the answer you want.
prod_desc activity_month prod_count pct_ch
3 product_a 2013-11-30 52 NaN
9 product_a 2013-12-31 50 -0.038462
0 product_a 2014-01-31 53 0.060000
12 product_a 2014-02-28 52 -0.018868
6 product_a 2014-03-31 50 -0.038462
4 product_b 2013-11-30 43 NaN
10 product_b 2013-12-31 41 -0.046512
1 product_b 2014-01-31 44 0.073171
13 product_b 2014-02-28 43 -0.022727
7 product_b 2014-03-31 41 -0.046512
5 product_c 2013-11-30 35 NaN
11 product_c 2013-12-31 34 -0.028571
2 product_c 2014-01-31 36 0.058824
14 product_c 2014-02-28 35 -0.027778
8 product_c 2014-03-31 34 -0.028571
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.