简体   繁体   中英

Time Series Stationary Technique

I am working with time series data (non-stationary), I have applied .diff(periods=n) for differencing the data to eliminate trends and seasonality factors from data.

By using .diff(periods=n) , the observation from the previous time step ( t-1 ) is subtracted from the current observation ( t ).

Now I want to invert back the differenced data to its original scale, but I am having issues with that. You can find the code here .

My code for differencing:

data_diff = df.diff(periods=1)     

data_diff.head(5) 

My code for inverting the differenced data back to its original scale:

cols = df.columns
x = []
for col in cols:
    diff_results = df[col] + data_diff[col].shift(-1)
    x.append(diff_results)
diff_df_inverted = pd.concat(x, axis=1)

diff_df_inverted

As you can see from last output in the code, I have successfully inverted my data back to its original scale. However, I do not get the inverted data for row 1. It inverts and shifts the values up a row. My question is, why? What am I missing?

thank you!

In this line:

diff_results = df[col] + data_diff[col].shift(-1)

data_diff starts from the second row and that is the reason it appears as it could be shifted up.
The reason for this is because you use .shift(-1) .

An easy solution would be using df.cumsum() as it is the exact opposite of df.diff() .

The only thing you have to do is get the first row to replace the NaN values from your data_diff dataframe. You need to do this because it is the original row that every other row would be added to. After that, you call data_diff.cumsum() and now you have the original data.

Here is the detailed code.

data_diff.iloc[0]=df.iloc[0]
a = data_diff.cumsum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM