简体   繁体   English

将功能重要性向量压缩到列名称数组时,Scala java.io toArray错误

[英]Scala java.io toArray error when zipping feature importance vector to column names array

When trying to zip the feature importance vector from lightGBM getfeatureImportances to column names array, i ran into an error below: 当尝试将功能重要性向量从lightGBM getfeatureImportances压缩到列名称数组时,我遇到以下错误:

import com.microsoft.ml.spark.LightGBMClassificationModel
import org.apache.spark.ml.classification.RandomForestClassificationModel

def getFeatureImportances(inputContainer: PipelineModelContainer): (String, String) = {
    val transformer = inputContainer.pipelineModel.stages.last

    val featureImportancesVector = inputContainer.params match {
        case RandomForestParameters(numTrees, treeDepth, featureTransformer) =>
            transformer.asInstanceOf[RandomForestClassificationModel].featureImportances
        case LightGBMParameters(treeDepth, numLeaves, iterations, featureTransformer) => 
            transformer.asInstanceOf[LightGBMClassificationModel].getFeatureImportances("split")
    }

    val colNames = inputContainer.featureColNames
    val sortedFeatures = (colNames zip featureImportancesVector.toArray).sortWith(_._2 > _._2).zipWithIndex
}

I am getting this error with reference to the last line of my code: 我在参考代码的最后一行时遇到此错误:

value toArray is not a member of java.io.Serializable

Seems like the light GBM feature importances cannot be transformed to an array. 似乎轻型GBM功能的重要性无法转换为数组。 This code works fine if its just the randomForestClassifier feature importance. 如果此代码仅具有randomForestClassifier功能重要性,则可以正常工作。 What other things can i do? 我还能做什么?

In the two branches of the match block, one returns Array[Double] , another returns Vector . match块的两个分支中,一个返回Array[Double] ,另一个返回Vector

The common super type of the two types is java.io.Serializable , so Scala inferred the type of the variable featureImportancesVector to that. 两种类型的常见超级类型是java.io.Serializable ,因此Scala将变量featureImportancesVector的类型推断为该类型。 toArray is not available in that type, despite that the method exists in both cases. 尽管在两种情况下都存在该方法,但toArray在该类型中不可用。

To fix this is easy, as suggested in the comment, move the .toArray to the featureImportances , so that the type of both branches, and thus the type of the variable, become Array[Double] . 如注释中所建议,要解决此问题很容易,请将.toArray移至featureImportances ,以便两个分支的类型以及变量的类型变为Array[Double]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM