為什么Writable的隱式轉換不起作用

Question

SparkContext定義了Writable及其基本類型之間的一些隱式轉換，如LongWritable <-> Long ， Text <-> String 。

測試案例1：

我使用以下代碼來組合小文件

  @Test
  def  testCombineSmallFiles(): Unit = {
    val path = "file:///d:/logs"
    val rdd = sc.newAPIHadoopFile[LongWritable,Text, CombineTextInputFormat](path)
    println(s"rdd partition number is ${rdd.partitions.length}")
    println(s"lines is :${rdd.count()}")
  }

上面的代碼運行良好，但是如果我使用以下行來獲取rdd，則會導致編譯錯誤：

val rdd = sc.newAPIHadoopFile[Long,String, CombineTextInputFormat](path)

看起來隱式轉換沒有生效。 我想知道這里有什么問題，為什么它不起作用。

測試案例2：

使用以下使用sequenceFile的代碼，隱式轉換看起來有效（Text轉換為String，IntWritable轉換為Int）

 @Test
  def testReadWriteSequenceFile(): Unit = {
    val data = List(("A", 1), ("B", 2), ("C", 3))
    val outputDir = Utils.getOutputDir()
    sc.parallelize(data).saveAsSequenceFile(outputDir)
    //implicit conversion works for the SparkContext#sequenceFile method
    val rdd = sc.sequenceFile(outputDir + "/part-00000", classOf[String], classOf[Int])
    rdd.foreach(println)
  }

比較這兩個測試用例，我沒有看到使一個工作的關鍵區別，另一個不起作用。

注意：

我在TEST CASE 2中使用的SparkContext#sequenceFile方法是：

  def sequenceFile[K, V](
      path: String,
      keyClass: Class[K],
      valueClass: Class[V]): RDD[(K, V)] = withScope {
    assertNotStopped()
    sequenceFile(path, keyClass, valueClass, defaultMinPartitions)
  }

在sequenceFile方法中，它調用另一個sequenceFile方法，該方法調用hadoopFile方法來讀取數據

  def sequenceFile[K, V](path: String,
      keyClass: Class[K],
      valueClass: Class[V],
      minPartitions: Int
      ): RDD[(K, V)] = withScope {
    assertNotStopped()
    val inputFormatClass = classOf[SequenceFileInputFormat[K, V]]
    hadoopFile(path, inputFormatClass, keyClass, valueClass, minPartitions)
  }

Answer 1

要使用隱式轉換，需要使用WritableConverter 。 例如：

   def sequenceFile[K, V]
       (path: String, minPartitions: Int = defaultMinPartitions)
       (implicit km: ClassTag[K], vm: ClassTag[V],
        kcf: () => WritableConverter[K], vcf: () => WritableConverter[V]): RDD[(K, V)] = {...}

我在doc sc.newAPIHadoopFile 文件 sc.newAPIHadoopFile不到任何地方。 所以它不可能。

另外，請驗證您使用了import SparkContext._ （我無法在您的帖子中看到導入）

PLS。 看一下SparkContext中的WritableConverters ，它有下面的代碼

/**
 * A class encapsulating how to convert some type `T` from `Writable`. It stores both the `Writable`
 * class corresponding to `T` (e.g. `IntWritable` for `Int`) and a function for doing the
 * conversion.
 * The getter for the writable class takes a `ClassTag[T]` in case this is a generic object
 * that doesn't know the type of `T` when it is created. This sounds strange but is necessary to
 * support converting subclasses of `Writable` to themselves (`writableWritableConverter()`).
 */
private[spark] class WritableConverter[T](
    val writableClass: ClassTag[T] => Class[_ <: Writable],
    val convert: Writable => T)
  extends Serializable

object WritableConverter {

  // Helper objects for converting common types to Writable
  private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
  : WritableConverter[T] = {
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

  // The following implicit functions were in SparkContext before 1.3 and users had to
  // `import SparkContext._` to enable them. Now we move them here to make the compiler find
  // them automatically. However, we still keep the old functions in SparkContext for backward
  // compatibility and forward to the following functions directly.

  implicit def intWritableConverter(): WritableConverter[Int] =
    simpleWritableConverter[Int, IntWritable](_.get)

  implicit def longWritableConverter(): WritableConverter[Long] =
    simpleWritableConverter[Long, LongWritable](_.get)

  implicit def doubleWritableConverter(): WritableConverter[Double] =
    simpleWritableConverter[Double, DoubleWritable](_.get)

  implicit def floatWritableConverter(): WritableConverter[Float] =
    simpleWritableConverter[Float, FloatWritable](_.get)

  implicit def booleanWritableConverter(): WritableConverter[Boolean] =
    simpleWritableConverter[Boolean, BooleanWritable](_.get)

  implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = {
    simpleWritableConverter[Array[Byte], BytesWritable] { bw =>
      // getBytes method returns array which is longer then data to be returned
      Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)
    }
  }

  implicit def stringWritableConverter(): WritableConverter[String] =
    simpleWritableConverter[String, Text](_.toString)

  implicit def writableWritableConverter[T <: Writable](): WritableConverter[T] =
    new WritableConverter[T](_.runtimeClass.asInstanceOf[Class[T]], _.asInstanceOf[T])
}

編輯：

我已經更新了我的問題並提供了兩個測試用例，一個是有效的，另一個沒有，但我無法弄清楚它們之間有什么區別。

隱式轉換需要`WritableConverter` 。

Testcase1即val rdd = sc.newAPIHadoopFile...(path)隱式轉換在側卻沒有這樣做SparkContext 。 這就是為什么如果你通過Long它不會工作，將給編譯器錯誤
TestCase2即val rdd = sc.sequenceFile(...)您直接傳遞ClassOf[...] 。 如果您傳遞ClassOf[...]不需要隱式轉換，因為這些類不是Long值或String Value ..

為什么Writable的隱式轉換不起作用

問題描述

1 個解決方案

解決方案1
3 2017-07-03 12:42:45

隱式轉換需要`WritableConverter` 。

為什么Writable的隱式轉換不起作用

問題描述

1 個解決方案

解決方案1 3 2017-07-03 12:42:45

隱式轉換需要WritableConverter 。

解決方案1
3 2017-07-03 12:42:45

隱式轉換需要`WritableConverter` 。