使用Python的Databricks中的XGBoost

Question

因此，最近我一直在研究Mlib Databricks集群，並發現根據文檔XGBoost可用於我的集群版本（5.1）。 該集群正在運行Python 2。

我感到XGBoost4J僅適用於Scala和Java。 所以我的問題是：如何在不丟失分發功能的情況下將xgboost模塊導入此環境？

我的代碼示例如下

from pyspark.ml import Pipeline
from pyspark.ml.feature import StringIndexer
from pyspark.ml.feature import VectorAssembler
import xgboost as xgb # Throws error because module is not installed and it should

# Transform class to classIndex to make xgboost happy
stringIndexer = StringIndexer(inputCol="species", outputCol="species_index").fit(newInput)
labelTransformed = stringIndexer.transform(newInput).drop("species")

# Compose feature columns as vectors
vectorCols = ["sepal_length", "sepal_width", "petal_length", "petal_width", "species_index"]
vectorAssembler = VectorAssembler(inputCols=vectorCols, outputCol="features")
xgbInput = vectorAssembler.transform(labelTransformed).select("features", "species_index")

Answer 1

您可以嘗試使用spark-sklearn分發xgboost的python或scikit-learn版本，但是該分發與xgboost4j分發不同。 我聽說databricks上的xgboost4j的pyspark API即將推出，請繼續關注。

Answer 2

順便說一下，相關的拉取請求可以在這里找到

使用Python的Databricks中的XGBoost

問題描述

2 個解決方案

解決方案1
1 已采納 2019-03-08 22:40:24

解決方案2
1 2019-08-18 14:34:20

使用Python的Databricks中的XGBoost

問題描述

2 個解決方案

解決方案1 1 已采納 2019-03-08 22:40:24

解決方案2 1 2019-08-18 14:34:20

解決方案1
1 已采納 2019-03-08 22:40:24

解決方案2
1 2019-08-18 14:34:20