pandas pct_change unrealistic values

Question

i execute the following python code:

data_extracted = data_extracted.interpolate(method='linear', 
axis=0).ffill().bfill()
data_extracted = data_extracted.replace([np.inf, -np.inf], np.nan).fillna(0)
data_pct_change = data_extracted.pct_change(axis=0).replace([np.inf, -np.inf], 
np.nan)
data_pct_change = data_pct_change.fillna(0)
print(data_pct_change)

This is the input(data_extracted, example):

ARTICLE_NUMBER    400115897090  500109158982  
DATE                                                                     
2016-01-18            NaN            NaN        
2016-02-01         5914.0         8776.0        
2016-02-15            NaN            NaN           
2016-02-29            NaN         4402.0          
2016-03-14         6214.0         6880.0         
2016-04-04         6766.0         7942.0          
2016-04-11         6454.0         7528.0         
2016-04-25         6070.0         7534.0          
2016-05-16         6778.0         7066.0         
2016-05-30         6856.0            NaN            
2016-06-20         7132.0         7138.0        
2016-06-27         7384.0         7426.0        
2016-07-18         8830.0         8614.0        
2016-08-01         9448.0         9166.0        
2016-08-15         8824.0         9676.0         
2016-08-22         8500.0         8974.0        
2016-09-12         6226.0         6868.0        
2016-10-03         6754.0         7426.0        
2016-11-07            NaN         8296.0        
2016-11-14         7858.0         8116.0         
2016-11-21         8212.0         9070.0         
2016-12-05            NaN            NaN           
2016-12-19         9428.0         8284.0

Then the code above is executed and i get the follwoing result:

ARTICLE_NUMBER   400115897090  500109158982  
DATE                                                                     
2016-01-18       0.000000       0.000000        
2016-02-01       0.000000       0.000000         
2016-02-15       0.000000       0.000000      
2016-02-29       0.000000       0.000000     
2016-03-14       0.000000       0.000000     
2016-04-04       0.000000       0.000000      
2016-04-11       0.000000       0.000000       
2016-04-25       0.000000       0.000000      
2016-05-16       0.000000       0.000000      
2016-05-30       0.000000       0.000000       
2016-06-20       0.000000       0.000000       
2016-06-27       0.000000       0.000000       
2016-07-18       0.000000       0.000000      
2016-08-01       0.000000       0.000000       
2016-08-15       0.000000       0.000000      
2016-08-22      13.384615     252.600000      
2016-09-12      -0.221925       0.807571      
2016-10-03       0.407216       0.172339      
2016-11-07      -0.104396      -0.109044      
2016-11-14       0.053170       0.299499       
2016-11-21      -0.029773      -0.020572      
2016-12-05       0.111074      -0.798490      
2016-12-19       0.099970       4.998371

Why do i get such wrong results? I know about floating point precision, but that here is really weird. For example: 2016-08-22: An increase of 252% from 9676 to 8974. That is definitvely wrong as well as the 0.000000's Can anyone explain me why? This is Python 3 with pandas at the version 0.22.0. Thanks a lot

Answer 1

You can utilize the shift function in pandas to turn this into a vectorized operation. The first thing to do is to make sure DATE is your index. If you have already set DATE as your index you can skip this set.

data_extracted.set_index("DATE", inplace=True)

Next, you can make a new DataFrame that will shift all the rows down by one.

shifted = data_extracted.shift(1)

Now you can do a simple pct change calculate with these two DataFrames:

pct_change = (data_extracted - shifted) / shifted

If a NaN value is present for a row in either DataFrame the result will be NaN in pct_change . The value for the '2016-08-22' example in your question is -0.07, which is expected given the values 9676 and 8974.

pandas pct_change unrealistic values

Question

1 answers

solution1
1 ACCPTED 2018-07-19 14:07:44

pandas pct_change unrealistic values

Question

1 answers

solution1 1 ACCPTED 2018-07-19 14:07:44

solution1
1 ACCPTED 2018-07-19 14:07:44