简体   繁体   中英

why do I have a label Problem when using Crossvalidator

I'm new to spark:) I try to use CrossValidator. My model is as follows:

training

#training data - several repartition have been tested, 50/50 seems the best
(trainData, testData) = modelData.randomSplit([0.5, 0.5])

#counting data used
print("Training dataset count : " +str(trainData.count()))
print("Test dataset count : " +str(testData.count()))
trainData.cache()
testData.cache()

Model

from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol = 'features', labelCol = 'v4_Indexer', maxIter = 5)
lrModel = lr.fit(trainData)
predictions = lrModel.transform(testData)
predictions.select('v4_Indexer','features','rawPrediction', 'prediction', 'probability').toPandas().head(2500)

I try this code for crossvalidation:

from pyspark.ml.tuning import ParamGridBuilder, CrossValidator from pyspark.ml import Pipeline

pipeline = Pipeline(stages=[lr])
paramGrid = (ParamGridBuilder() .addGrid(lr.regParam, [0,0.5,1]).addGrid(lr.elasticNetParam, [0,0.5,1]).addGrid(lr.maxIter,[1,10]).build())
cv = CrossValidator(estimator=lr, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=5)
cvModel = cv.fit(trainData)
trainingSummary = cvModel.bestModel

I have a warning /databricks/spark/python/pyspark/ml/util.py:92: UserWarning: CrossValidator_7ba8c8c903af fit call failed but some spark jobs may still running for unfinished trials. To address this issue, you should enable pyspark pinned thread mode. warnings.warn("{} fit call failed but some spark jobs "

And an error: IllegalArgumentException: label does not exist. Available: v4_Indexer, features, CrossValidator_7ba8c8c903af_rand

this model worked for a while. I do not understand why it doesn't now.

Thx in advance for any help you could bring me =)

Problem solved: my trainData and testData where cache(), thus it couldn't find them:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM