[英]Extract p values in a list for Adfuller test(Test for stationarity) in ARIMA Time series modeling python pandas
df
Col1 Col2 Col3
12 10 3
3 5 2
100 12 10
等等.....
為時間序列中的 ARIMA 建模編寫更多測試的代碼。 (將為數據框 df 的所有列計算 p 值)
import statsmodels.tsa.stattools as tsa
adf_results = {}
for col in df.columns.values:
adf_results[col] = tsa.adfuller(df[col])
使用此代碼我得到以下格式的輸出:(當我輸入 adf_result 時輸出)
[IN] adf_result
[OUT]
{'Col1': (-4.236149193618492,
0.0005719678593039654, #This is the second value for this column/p value
0,
37,
{'1%': -3.6209175221605827,
'5%': -2.9435394610388332,
'10%': -2.6104002410518627},
138.66116123406837),
'Col2': (-3.707023043984407,
0.004015446231411924, #This is the second value for this column/p value
0,
37,
{'1%': -3.6209175221605827,
'5%': -2.9435394610388332,
'10%': -2.6104002410518627},
144.6019873130419),
'Col3': (1.8083888603589304,
0.9983655107052215, #This is the second value for this column/p value
0,
37,
{'1%': -3.6209175221605827,
'5%': -2.9435394610388332,
'10%': -2.6104002410518627},
-74.4384052778039)}
等等。
在這個問題中,第二個值/p 值是
0.0005719678593039654, 0.004015446231411924 and 0.9983655107052215 for the 3 columns taken.
我需要一個列表中第二個值 >0.05 的列和另一個列表中 p 值 <0.05 的列
因此,一個列表將是 col1 和 col2(第二個值/p 值<0.05),另一個列表將是 col3(第二個值/p 值<0.05)
import pandas as pd
from io import StringIO
data = StringIO("""
Col1 Col2 Col3
12 10 3
3 5 2
100 12 10
13 4 1
""")
# load data into data frame
df = pd.read_csv(data, sep=' ')
import statsmodels.tsa.stattools as tsa
adf_results = {}
for col in df.columns.values:
adf_results[col] = tsa.adfuller(df[col])
# loop over dictionary data
columns_big = []
columns_small = []
for key, value in adf_results.items():
if value[1] > 0.05:
columns_big.append(key)
else:
columns_small.append(key)
輸出:
columns_big = ['Col1', 'Col3']
columns_small = ['Col2']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.