繁体   English   中英

在Flink中调试自定义管道转换器

[英]Debug a custom Pipeline Transformer in Flink

我正在尝试按照Flink 文档中的指示在Flink中实现自定义Transformer,但是当我尝试执行它时,似乎从未调用过fit操作。 这是我到目前为止所做的:

class InfoGainTransformer extends Transformer[InfoGainTransformer] {

  import InfoGainTransformer._

  private[this] var counts: Option[collection.immutable.Vector[Map[Key, Double]]] = None

  // here setters for params, as Flink does

}

object InfoGainTransformer {

  // ====================================== Parameters =============================================
  // ...

  // ==================================== Factory methods ==========================================
  // ...

  // ========================================== Operations =========================================

  implicit def fitLabeledVectorInfoGain = new FitOperation[InfoGainTransformer, LabeledVector] {
    override def fit(instance: InfoGainTransformer, fitParameters: ParameterMap, input: DataSet[LabeledVector]): Unit = {
      val counts = collection.immutable.Vector[Map[Key, Double]]()
      input.map {
        v =>
          v.vector.map {
            case (i, value) =>
              println("INSIDE!!!")
              val key = Key(value, v.label)
              val cval = counts(i).getOrElse(key, .0)
              counts(i) + (key -> cval)
          }
      }
    }
  }

  implicit def fitVectorInfoGain[T <: Vector] = new FitOperation[InfoGainTransformer, T] {
    override def fit(instance: InfoGainTransformer, fitParameters: ParameterMap, input: DataSet[T]): Unit = {
      input
    }
  }

  implicit def transformLabeledVectorsInfoGain = {
    new TransformDataSetOperation[InfoGainTransformer, LabeledVector, LabeledVector] {
      override def transformDataSet(
                                     instance: InfoGainTransformer,
                                     transformParameters: ParameterMap,
                                     input: DataSet[LabeledVector]): DataSet[LabeledVector] = input
    }
  }

  implicit def transformVectorsInfoGain[T <: Vector : BreezeVectorConverter : TypeInformation : ClassTag] = {
    new TransformDataSetOperation[InfoGainTransformer, T, T] {
      override def transformDataSet(instance: InfoGainTransformer, transformParameters: ParameterMap, input: DataSet[T]): DataSet[T] = input
    }
  }
}

然后,我尝试以两种方式使用它:

val scaler = StandardScaler()
val polyFeatures = PolynomialFeatures()
val mlr = MultipleLinearRegression()
val gain = InfoGainTransformer().setK(2)

// Construct the pipeline
val pipeline = scaler
  .chainTransformer(polyFeatures)
  .chainTransformer(gain)
  .chainPredictor(mlr)

val r = pipeline.predict(dataSet map (_.vector))
r.print()

而且只有我的变压器:

pipeline.fit(dataSet)

在这两种情况下,当我在fitLabeledVectorInfoGain设置断点时(例如,在input.map行中),调试器都将在此处停止,但是如果我还在嵌套映射中设置断点,例如,风箱println("INSIDE!!!") ,它永远不会停在那里。

有谁知道我该如何调试此自定义转换器?

看来它现在正在工作。 我认为发生的事情是我没有正确执行FitOperation因为实例状态中没有保存任何内容,这是现在的实现:

implicit def fitLabeledVectorInfoGain = new FitOperation[InfoGainTransformer, LabeledVector] {
    override def fit(instance: InfoGainTransformer, fitParameters: ParameterMap, input: DataSet[LabeledVector]): Unit = {
      //      val counts = collection.immutable.Vector[Map[Key, Double]]()
      val r = input.map {
        v =>
          v.vector.foldLeft(Map.empty[Key, Double]) {
            case (m, (i, value)) =>
              println("INSIDE fit!!!")
              val key = Key(value, v.label)
              val cval = m.getOrElse(key, .0) + 1.0
              m + (key -> cval)
          }
      }
      instance.counts = Some(r)
    }
  }

现在,调试器将在所有断点中正确输入,并且还将调用TransformOperation

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM