i execute the following python code:
data_extracted = data_extracted.interpolate(method='linear',
axis=0).ffill().bfill()
data_extracted = data_extracted.replace([np.inf, -np.inf], np.nan).fillna(0)
data_pct_change = data_extracted.pct_change(axis=0).replace([np.inf, -np.inf],
np.nan)
data_pct_change = data_pct_change.fillna(0)
print(data_pct_change)
This is the input(data_extracted, example):
ARTICLE_NUMBER 400115897090 500109158982
DATE
2016-01-18 NaN NaN
2016-02-01 5914.0 8776.0
2016-02-15 NaN NaN
2016-02-29 NaN 4402.0
2016-03-14 6214.0 6880.0
2016-04-04 6766.0 7942.0
2016-04-11 6454.0 7528.0
2016-04-25 6070.0 7534.0
2016-05-16 6778.0 7066.0
2016-05-30 6856.0 NaN
2016-06-20 7132.0 7138.0
2016-06-27 7384.0 7426.0
2016-07-18 8830.0 8614.0
2016-08-01 9448.0 9166.0
2016-08-15 8824.0 9676.0
2016-08-22 8500.0 8974.0
2016-09-12 6226.0 6868.0
2016-10-03 6754.0 7426.0
2016-11-07 NaN 8296.0
2016-11-14 7858.0 8116.0
2016-11-21 8212.0 9070.0
2016-12-05 NaN NaN
2016-12-19 9428.0 8284.0
Then the code above is executed and i get the follwoing result:
ARTICLE_NUMBER 400115897090 500109158982
DATE
2016-01-18 0.000000 0.000000
2016-02-01 0.000000 0.000000
2016-02-15 0.000000 0.000000
2016-02-29 0.000000 0.000000
2016-03-14 0.000000 0.000000
2016-04-04 0.000000 0.000000
2016-04-11 0.000000 0.000000
2016-04-25 0.000000 0.000000
2016-05-16 0.000000 0.000000
2016-05-30 0.000000 0.000000
2016-06-20 0.000000 0.000000
2016-06-27 0.000000 0.000000
2016-07-18 0.000000 0.000000
2016-08-01 0.000000 0.000000
2016-08-15 0.000000 0.000000
2016-08-22 13.384615 252.600000
2016-09-12 -0.221925 0.807571
2016-10-03 0.407216 0.172339
2016-11-07 -0.104396 -0.109044
2016-11-14 0.053170 0.299499
2016-11-21 -0.029773 -0.020572
2016-12-05 0.111074 -0.798490
2016-12-19 0.099970 4.998371
Why do i get such wrong results? I know about floating point precision, but that here is really weird. For example: 2016-08-22: An increase of 252% from 9676 to 8974. That is definitvely wrong as well as the 0.000000's Can anyone explain me why? This is Python 3 with pandas at the version 0.22.0. Thanks a lot
You can utilize the shift
function in pandas
to turn this into a vectorized operation. The first thing to do is to make sure DATE
is your index. If you have already set DATE
as your index you can skip this set.
data_extracted.set_index("DATE", inplace=True)
Next, you can make a new DataFrame that will shift all the rows down by one.
shifted = data_extracted.shift(1)
Now you can do a simple pct change calculate with these two DataFrames:
pct_change = (data_extracted - shifted) / shifted
If a NaN
value is present for a row in either DataFrame the result will be NaN
in pct_change
. The value for the '2016-08-22' example in your question is -0.07, which is expected given the values 9676 and 8974.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.