[英]How to calculate prediction for data_test using 1 predictor
僅使用dep_delay(dep_delay> 30)作為h2o中的預測器來計算測試數據的預測精度
我首先指定響應:
response <- "late_arrival"
比我指定預測器:
predictors <- filter(flights, flights$dep_delay>30)
比我用公式來計算glm:
> flights_test_delay_glm <- h2o.glm(training_frame=flights_test, x=predictors, y=response, family="binomial")
我得到這個錯誤:
Error in .verify_dataxy(training_frame, x, y) :
`x` must be column names or indices
我確實交叉檢查了預測值,看起來沒問題:
summary(predictors)
X year month day dep_time
Min. : 86 Min. :2013 Min. : 1.000 Min. : 1.00 Min. : 1
1st Qu.:103457 1st Qu.:2013 1st Qu.: 4.000 1st Qu.: 9.00 1st Qu.:1428
Median :186217 Median :2013 Median : 6.000 Median :16.00 Median :1755
Mean :178012 Mean :2013 Mean : 6.372 Mean :15.79 Mean :1676
3rd Qu.:253087 3rd Qu.:2013 3rd Qu.: 9.000 3rd Qu.:23.00 3rd Qu.:2028
Max. :336764 Max. :2013 Max. :12.000 Max. :31.00 Max. :2400
sched_dep_time dep_delay arr_time sched_arr_time arr_delay
Min. : 500 Min. : 31.00 Min. : 1 Min. : 1 Min. : -42.00
1st Qu.:1334 1st Qu.: 44.00 1st Qu.:1308 1st Qu.:1457 1st Qu.: 39.00
Median :1645 Median : 66.00 Median :1841 Median :1841 Median : 65.00
Mean :1581 Mean : 86.82 Mean :1598 Mean :1730 Mean : 83.29
3rd Qu.:1910 3rd Qu.: 107.00 3rd Qu.:2134 3rd Qu.:2112 3rd Qu.: 108.00
Max. :2359 Max. :1301.00 Max. :2400 Max. :2359 Max. :1272.00
NA's :216 NA's :386
carrier flight tailnum origin dest
EV :11655 Min. : 1.0 N15910 : 84 EWR:19914 ORD : 2653
B6 : 8411 1st Qu.: 619.5 N258JB : 79 JFK:15241 ATL : 2268
UA : 7617 Median :1692.0 N14573 : 78 LGA:13136 BOS : 1840
DL : 4982 Mean :2250.0 N15980 : 77 MCO : 1814
MQ : 3730 3rd Qu.:4100.0 N725MQ : 77 SFO : 1733
AA : 3537 Max. :8500.0 N12921 : 76 FLL : 1708
(Other): 8359 (Other):47820 (Other):36275
air_time distance hour minute
Min. : 20.0 Min. : 80.0 Min. : 5.00 Min. : 0.00
1st Qu.: 77.0 1st Qu.: 483.0 1st Qu.:13.00 1st Qu.:10.00
Median :120.0 Median : 762.0 Median :16.00 Median :29.00
Mean :140.7 Mean : 971.2 Mean :15.54 Mean :27.57
3rd Qu.:171.0 3rd Qu.:1134.0 3rd Qu.:19.00 3rd Qu.:45.00
Max. :666.0 Max. :4983.0 Max. :23.00 Max. :59.00
NA's :386
time_hour
2013-08-08 19:00:00: 52
2013-08-08 17:00:00: 51
2013-07-22 17:00:00: 49
2013-03-08 17:00:00: 48
2013-06-25 17:00:00: 48
2013-07-28 19:00:00: 48
(Other) :47995
我需要幫助才能理解我是否對預測值進行了錯誤的編碼,因為它只是說我需要使用dep_delay,大於30作為預測器。 謝謝!
x參數接受列名或索引的列表(或向量)。 檢查預測變量的數據類型,以驗證是否傳遞了名稱向量或數據幀。 您可以在此處查看如何使用此參數的示例。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.