简体   繁体   中英

Parsing Protobuf ByteString in Spark not working after creating Encoder

I'm trying to parse protobuf (protobuf3) data in spark 2.4 and I'm having some trouble with the ByteString type. I've created the case class using the ScalaPB library and loaded the jar into a spark shell. I've also tried creating a implicit encoder for the type however I still get the following error;

java.lang.UnsupportedOperationException: No Encoder found for com.google.protobuf.ByteString

Here is what I've tried so far;

import proto.Event._ // my proto case class
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoders.kryo

// Register our UDTs to avoid "<none> is not a term" error:
EventProtoUdt.register()

val inputFile = "data.avro"

object ByteStringEncoder{ 
  implicit def byteStringEncoder: Encoder[com.google.protobuf.ByteString] = org.apache.spark.sql.Encoders.kryo[com.google.protobuf.ByteString] 
}

import ByteStringEncoder._
import spark.implicits._

def parseLine(s: String): Event= Event.parseFrom(org.apache.commons.codec.binary.Base64.decodeBase64(s))

import scalapb.spark._
val eventsDf = spark.read.format("avro").load(inputFile)

val eventsDf2 = eventsDf .map(row => row.getAs[Array[Byte]]("Body")).map(Event.parseFrom(_))

Any help is appreciated

This issue has been fixed in sparksql-scalapb 0.9.0. Please see the updated documentation on setting the imports so an Encoder for ByteString is picked up by implicit search.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM