[英]Why does implicit conversions for Writable doesn't work
SparkContext
定義了Writable
及其基本類型之間的一些隱式轉換,如LongWritable <-> Long
, Text <-> String
。
我使用以下代碼來組合小文件
@Test
def testCombineSmallFiles(): Unit = {
val path = "file:///d:/logs"
val rdd = sc.newAPIHadoopFile[LongWritable,Text, CombineTextInputFormat](path)
println(s"rdd partition number is ${rdd.partitions.length}")
println(s"lines is :${rdd.count()}")
}
上面的代碼運行良好,但是如果我使用以下行來獲取rdd,則會導致編譯錯誤:
val rdd = sc.newAPIHadoopFile[Long,String, CombineTextInputFormat](path)
看起來隱式轉換沒有生效。 我想知道這里有什么問題,為什么它不起作用。
使用以下使用sequenceFile的代碼,隱式轉換看起來有效(Text轉換為String,IntWritable轉換為Int)
@Test
def testReadWriteSequenceFile(): Unit = {
val data = List(("A", 1), ("B", 2), ("C", 3))
val outputDir = Utils.getOutputDir()
sc.parallelize(data).saveAsSequenceFile(outputDir)
//implicit conversion works for the SparkContext#sequenceFile method
val rdd = sc.sequenceFile(outputDir + "/part-00000", classOf[String], classOf[Int])
rdd.foreach(println)
}
比較這兩個測試用例,我沒有看到使一個工作的關鍵區別,另一個不起作用。
我在TEST CASE 2中使用的SparkContext#sequenceFile
方法是:
def sequenceFile[K, V](
path: String,
keyClass: Class[K],
valueClass: Class[V]): RDD[(K, V)] = withScope {
assertNotStopped()
sequenceFile(path, keyClass, valueClass, defaultMinPartitions)
}
在sequenceFile
方法中,它調用另一個sequenceFile方法,該方法調用hadoopFile方法來讀取數據
def sequenceFile[K, V](path: String,
keyClass: Class[K],
valueClass: Class[V],
minPartitions: Int
): RDD[(K, V)] = withScope {
assertNotStopped()
val inputFormatClass = classOf[SequenceFileInputFormat[K, V]]
hadoopFile(path, inputFormatClass, keyClass, valueClass, minPartitions)
}
要使用隱式轉換,需要使用WritableConverter
。 例如 :
def sequenceFile[K, V]
(path: String, minPartitions: Int = defaultMinPartitions)
(implicit km: ClassTag[K], vm: ClassTag[V],
kcf: () => WritableConverter[K], vcf: () => WritableConverter[V]): RDD[(K, V)] = {...}
我在doc sc.newAPIHadoopFile
文件 sc.newAPIHadoopFile
不到任何地方。 所以它不可能。
另外,請驗證您使用了import SparkContext._
(我無法在您的帖子中看到導入)
PLS。 看一下SparkContext
中的WritableConverters
,它有下面的代碼
/**
* A class encapsulating how to convert some type `T` from `Writable`. It stores both the `Writable`
* class corresponding to `T` (e.g. `IntWritable` for `Int`) and a function for doing the
* conversion.
* The getter for the writable class takes a `ClassTag[T]` in case this is a generic object
* that doesn't know the type of `T` when it is created. This sounds strange but is necessary to
* support converting subclasses of `Writable` to themselves (`writableWritableConverter()`).
*/
private[spark] class WritableConverter[T](
val writableClass: ClassTag[T] => Class[_ <: Writable],
val convert: Writable => T)
extends Serializable
object WritableConverter {
// Helper objects for converting common types to Writable
private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
: WritableConverter[T] = {
val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
}
// The following implicit functions were in SparkContext before 1.3 and users had to
// `import SparkContext._` to enable them. Now we move them here to make the compiler find
// them automatically. However, we still keep the old functions in SparkContext for backward
// compatibility and forward to the following functions directly.
implicit def intWritableConverter(): WritableConverter[Int] =
simpleWritableConverter[Int, IntWritable](_.get)
implicit def longWritableConverter(): WritableConverter[Long] =
simpleWritableConverter[Long, LongWritable](_.get)
implicit def doubleWritableConverter(): WritableConverter[Double] =
simpleWritableConverter[Double, DoubleWritable](_.get)
implicit def floatWritableConverter(): WritableConverter[Float] =
simpleWritableConverter[Float, FloatWritable](_.get)
implicit def booleanWritableConverter(): WritableConverter[Boolean] =
simpleWritableConverter[Boolean, BooleanWritable](_.get)
implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = {
simpleWritableConverter[Array[Byte], BytesWritable] { bw =>
// getBytes method returns array which is longer then data to be returned
Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)
}
}
implicit def stringWritableConverter(): WritableConverter[String] =
simpleWritableConverter[String, Text](_.toString)
implicit def writableWritableConverter[T <: Writable](): WritableConverter[T] =
new WritableConverter[T](_.runtimeClass.asInstanceOf[Class[T]], _.asInstanceOf[T])
}
編輯:
我已經更新了我的問題並提供了兩個測試用例,一個是有效的,另一個沒有,但我無法弄清楚它們之間有什么區別。
WritableConverter
。 Testcase1即val rdd = sc.newAPIHadoopFile...(path)
隱式轉換在側卻沒有這樣做SparkContext
。 這就是為什么如果你通過Long它不會工作,將給編譯器錯誤
TestCase2即val rdd = sc.sequenceFile(...)
您直接傳遞ClassOf[...]
。 如果您傳遞ClassOf[...]
不需要隱式轉換,因為這些類不是Long值或String Value ..
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.