簡體   English   中英

如何使用1個預測器計算data_test的預測

[英]How to calculate prediction for data_test using 1 predictor

僅使用dep_delay(dep_delay> 30)作為h2o中的預測器來計算測試數據的預測精度

我首先指定響應:

response <- "late_arrival"

比我指定預測器:

predictors <- filter(flights, flights$dep_delay>30)

比我用公式來計算glm:

> flights_test_delay_glm <- h2o.glm(training_frame=flights_test, x=predictors, y=response, family="binomial")

我得到這個錯誤:

Error in .verify_dataxy(training_frame, x, y) : 
  `x` must be column names or indices

我確實交叉檢查了預測值,看起來沒問題:

summary(predictors)


    X               year          month             day           dep_time   
 Min.   :    86   Min.   :2013   Min.   : 1.000   Min.   : 1.00   Min.   :   1  
 1st Qu.:103457   1st Qu.:2013   1st Qu.: 4.000   1st Qu.: 9.00   1st Qu.:1428  
 Median :186217   Median :2013   Median : 6.000   Median :16.00   Median :1755  
 Mean   :178012   Mean   :2013   Mean   : 6.372   Mean   :15.79   Mean   :1676  
 3rd Qu.:253087   3rd Qu.:2013   3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:2028  
 Max.   :336764   Max.   :2013   Max.   :12.000   Max.   :31.00   Max.   :2400  

 sched_dep_time   dep_delay          arr_time    sched_arr_time   arr_delay      
 Min.   : 500   Min.   :  31.00   Min.   :   1   Min.   :   1   Min.   : -42.00  
 1st Qu.:1334   1st Qu.:  44.00   1st Qu.:1308   1st Qu.:1457   1st Qu.:  39.00  
 Median :1645   Median :  66.00   Median :1841   Median :1841   Median :  65.00  
 Mean   :1581   Mean   :  86.82   Mean   :1598   Mean   :1730   Mean   :  83.29  
 3rd Qu.:1910   3rd Qu.: 107.00   3rd Qu.:2134   3rd Qu.:2112   3rd Qu.: 108.00  
 Max.   :2359   Max.   :1301.00   Max.   :2400   Max.   :2359   Max.   :1272.00  
                                  NA's   :216                   NA's   :386      
    carrier          flight          tailnum      origin           dest      
 EV     :11655   Min.   :   1.0   N15910 :   84   EWR:19914   ORD    : 2653  
 B6     : 8411   1st Qu.: 619.5   N258JB :   79   JFK:15241   ATL    : 2268  
 UA     : 7617   Median :1692.0   N14573 :   78   LGA:13136   BOS    : 1840  
 DL     : 4982   Mean   :2250.0   N15980 :   77               MCO    : 1814  
 MQ     : 3730   3rd Qu.:4100.0   N725MQ :   77               SFO    : 1733  
 AA     : 3537   Max.   :8500.0   N12921 :   76               FLL    : 1708  
 (Other): 8359                    (Other):47820               (Other):36275  
    air_time        distance           hour           minute     
 Min.   : 20.0   Min.   :  80.0   Min.   : 5.00   Min.   : 0.00  
 1st Qu.: 77.0   1st Qu.: 483.0   1st Qu.:13.00   1st Qu.:10.00  
 Median :120.0   Median : 762.0   Median :16.00   Median :29.00  
 Mean   :140.7   Mean   : 971.2   Mean   :15.54   Mean   :27.57  
 3rd Qu.:171.0   3rd Qu.:1134.0   3rd Qu.:19.00   3rd Qu.:45.00  
 Max.   :666.0   Max.   :4983.0   Max.   :23.00   Max.   :59.00  
 NA's   :386                                                     
               time_hour    
 2013-08-08 19:00:00:   52  
 2013-08-08 17:00:00:   51  
 2013-07-22 17:00:00:   49  
 2013-03-08 17:00:00:   48  
 2013-06-25 17:00:00:   48  
 2013-07-28 19:00:00:   48  
 (Other)            :47995  

我需要幫助才能理解我是否對預測值進行了錯誤的編碼,因為它只是說我需要使用dep_delay,大於30作為預測器。 謝謝!

x參數接受列名或索引的列表(或向量)。 檢查預測變量的數據類型,以驗證是否傳遞了名稱向量或數據幀。 您可以在此處查看如何使用此參數的示例。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM