簡體   English   中英

管道 auto_arima/SARIMAX

[英]Pipeline auto_arima/SARIMAX

我想使用 auto_arima 因為我想要 SARIMAX 並且我希望它選擇最好的參數。 我的代碼實際上是通過一堆不同的組合運行(因為它正在預測多個項目)並且它通過多個模型運行,因此最后我可以為每個組合選擇“最佳”model。 我已經在代碼的前面創建了所有正確的組合(使用字典等),但是有沒有辦法從逐步 model 中獲得每個組合的最佳參數,以便它自動適合那些“最佳”參數? 我知道您通常可以手動輸入最佳參數,然后“安裝”model,但有沒有辦法自動化?


results_dict = {}
preds_dict = {}
all_preds_df = pd.DataFrame({k: [] for k in df.columns})
all_preds_df2 = pd.DataFrame({k: [] for k in df2.columns})
all_preds_df3 = pd.DataFrame({k: [] for k in test.columns})

for itemnum in itemnumbers:
    for depot in depots:
        for department in departments:

            try: 
                df = data_dict[itemnum][depot][department] #train
                df2 = data_dict2[itemnum][depot][department] #test
                train = data_dict_train[itemnum][depot][department] #time series
                test = data_dict_test[itemnum][depot][department] #time series
                print(itemnum, depot, department)
            except KeyError:
                continue            

            # Set X and y
            X_train_date = df['Date']
            X_train = df.drop(['QUANTITY', 'Date', 'DESCRIPTION', 'NAME', 'Max_SEASON_DESC'], axis=1)
            y_train = df['QUANTITY']
            
            X_test_date = df2['Date']
            X_test = df2.drop(['QUANTITY', 'Date', 'DESCRIPTION', 'NAME', 'Max_SEASON_DESC'], axis=1)
            y_test = df2['QUANTITY']
            
            #Train and Test for Time Series - this is kind of not needed
            train_data = train
            test_data = test
            
            #Scaled Data

            # Establish model
            
            #Linear Regression
            
            #Random Forest
            
            #Gradient Boosted Tree with Grid Search
            parameters = {
                "n_estimators":[1,5,10,25,50,100,250,500],
                "max_depth":[1,3,5,7,9],
                "learning_rate":[1, 2, 3, 4, 5],
                "criterion": ['friedman_mse', 'mse', 'mae']}
            gbr = GradientBoostingRegressor()
            gbr_grid = GridSearchCV(estimator=gbr, param_grid=parameters, cv = 3)
            
            gbr_grid.fit(X_train, y_train)
            y_pred_gbr =gbr_grid.predict(X_train)
            y_pred_test_gbr =gbr_grid.predict(X_test)

            #Multinomial Naive Bayes
            
            #Time Series
            #Holt Winter
            holt_winter = ExponentialSmoothing(np.asarray(train_data['Sum_QUANTITY']), seasonal_periods=12, trend='add', seasonal='add')
            hw_fit = holt_winter.fit()
            hw_forecast = hw_fit.forecast(len(test_data))
            
            stepwise_model = auto_arima(train_data['Sum_QUANTITY'], start_p=1, start_q=1,
                                        max_p=3, max_q=3, m=12,
                                        start_P=0, seasonal=True,
                                        d=1, D=1, trace=True,
                                        error_action='ignore',  
                                        suppress_warnings=True, 
                                        stepwise=True)
            
            sm_fit = stepwise_model.fit(train_data)
            
            predictions_sm = sm_fit.forecast(len(test_data.Sum_QUANTITY))
            predictions_sm = pd.Series(predictions_sm, index=test_data.index)
            residuals_sm = test_data.Sum_QUANTITY - predictions_sm
            
            # Create forecast columns
            test['SM_Prediction'] = predictions_sm
            
            #df2 = df2.merge(pd.DataFrame(y_pred_test), how='left', on=['Date'])
            all_preds_df = pd.concat([all_preds_df, df])
            all_preds_df2 = pd.concat([all_preds_df2, df2])
            all_preds_df3 = pd.concat([all_preds_df3, test])

#Finish timer        
stop = timeit.default_timer()
print('Time: ', stop - start)

代碼是可以理解的。 它在這中間產生這個最終導致這個錯誤 -

Performing stepwise search to minimize aic
 ARIMA(1,1,1)(0,1,1)[12]             : AIC=inf, Time=0.30 sec
 ARIMA(0,1,0)(0,1,0)[12]             : AIC=582.161, Time=0.01 sec
 ARIMA(1,1,0)(1,1,0)[12]             : AIC=571.883, Time=0.05 sec
 ARIMA(0,1,1)(0,1,1)[12]             : AIC=inf, Time=0.21 sec
 ARIMA(1,1,0)(0,1,0)[12]             : AIC=580.572, Time=0.01 sec
 ARIMA(1,1,0)(2,1,0)[12]             : AIC=inf, Time=0.49 sec
 ARIMA(1,1,0)(1,1,1)[12]             : AIC=inf, Time=0.32 sec
 ARIMA(1,1,0)(0,1,1)[12]             : AIC=inf, Time=0.20 sec
 ARIMA(1,1,0)(2,1,1)[12]             : AIC=541.710, Time=0.82 sec
 ARIMA(1,1,0)(2,1,2)[12]             : AIC=inf, Time=0.95 sec
 ARIMA(1,1,0)(1,1,2)[12]             : AIC=inf, Time=0.69 sec
 ARIMA(0,1,0)(2,1,1)[12]             : AIC=inf, Time=0.58 sec
 ARIMA(2,1,0)(2,1,1)[12]             : AIC=572.817, Time=0.33 sec
 ARIMA(1,1,1)(2,1,1)[12]             : AIC=inf, Time=0.81 sec
 ARIMA(0,1,1)(2,1,1)[12]             : AIC=541.502, Time=0.82 sec
 ARIMA(0,1,1)(1,1,1)[12]             : AIC=inf, Time=0.40 sec
 ARIMA(0,1,1)(2,1,0)[12]             : AIC=540.432, Time=0.49 sec
 ARIMA(0,1,1)(1,1,0)[12]             : AIC=572.374, Time=0.06 sec
 ARIMA(0,1,0)(2,1,0)[12]             : AIC=inf, Time=0.38 sec
 ARIMA(1,1,1)(2,1,0)[12]             : AIC=inf, Time=0.65 sec
 ARIMA(0,1,2)(2,1,0)[12]             : AIC=542.396, Time=0.50 sec
 ARIMA(1,1,2)(2,1,0)[12]             : AIC=inf, Time=1.12 sec
 ARIMA(0,1,1)(2,1,0)[12] intercept   : AIC=540.868, Time=0.54 sec

Best model:  ARIMA(0,1,1)(2,1,0)[12]          
Total fit time: 10.734 seconds

ValueError: could not convert string to float: '20/06/4'

The auto_arima() function automatically returns the best model as an ARIMA model, so you have it saved in you stepwise_model that you also use for training/predicting etc. You can access the parameters via this model:

order = stepwise_model.order
seasonal_order = stepwise_model.seasonal_order

當您使用auto_arima(train_data['Sum_QUANTITY'] , ...) 創建 model 時,您無法使用比 Sum_QUANTITY 更多的變量來訓練它。 您在這里不需要這一行,因為 model 已經由auto_arima()訓練:

sm_fit = stepwise_model.fit(train_data)

但是,如果您重新訓練 model,您必須將相同的功能傳遞給它。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM