简体   繁体   中英

Fastest serialization/deserialization of Scala case classes

If I've got a nested object graph of case classes, similar to the example below, and I want to store collections of them in a redis list, what libraries or tools should I look at that that will give the fastest overall round trip to redis?

This will include:

  • Time to serialize the item
  • network cost of transferring the serialized data
  • network cost of retrieving stored serialized data
  • time to deserialize back into case classes

    case class Person(name: String, age: Int, children: List[Person]) {}

UPDATE (2018): scala/pickling is no longer actively maintained. There are hoards of other libraries that have arisen as alternatives which take similar approaches but which tend to focus on specific serialization formats; eg, JSON, binary, protobuf.

Your use case is exactly the targeted use case for scala/pickling ( https://github.com/scala/pickling ). Disclaimer: I'm an author .

Scala/pickling was designed to be a faster, more typesafe, and more open alternative to automatic frameworks like Java or Kryo. It was built in particular for distributed applications, so serialization/deserialization time and serialized data size take a front seat. It takes a different approach to serialization all together- it generates pickling (serialization) code inline at the use-site at compile-time, so it's really very fast.

The latest benchmarks are in our OOPSLA paper - for the binary pickle format (you can also choose others, like JSON) scala/pickling is consistently faster than Java and Kryo, and produces binary representations that are on par or smaller than Kryo's, meaning less latency when passing your pickled data over the network.

For more info, there's a project page: http://lampwww.epfl.ch/~hmiller/pickling

And a ScalaDays 2013 talk from June on Parley's .

We'll also be presenting some new developments in particular related to dealing with sending closures over the network at Strange Loop 2013, in case that might also be a pain point for your use case.

As of the time of this writing, scala/pickling is in pre-release, with our first stable release planned for August 21st.

Update:

You must be careful to use the serialize methods from JDK. The performance is not great and one small change in your class will make the data unable to deserialize.


I've used scala/pickling but it has a global lock while serializing/deserializing.

So instead of using it, I write my own serialization/deserialization code like this:

import java.io._

object Serializer {

  def serialize[T <: Serializable](obj: T): Array[Byte] = {
    val byteOut = new ByteArrayOutputStream()
    val objOut = new ObjectOutputStream(byteOut)
    objOut.writeObject(obj)
    objOut.close()
    byteOut.close()
    byteOut.toByteArray
  }

  def deserialize[T <: Serializable](bytes: Array[Byte]): T = {
    val byteIn = new ByteArrayInputStream(bytes)
    val objIn = new ObjectInputStream(byteIn)
    val obj = objIn.readObject().asInstanceOf[T]
    byteIn.close()
    objIn.close()
    obj
  }
}

Here is an example of using it:

case class Example(a: String, b: String)

val obj = Example("a", "b")
val bytes = Serializer.serialize(obj)
val obj2 = Serializer.deserialize[Example](bytes)

According to the upickle benchmarks: "uPickle runs 30-50% faster than Circe for reads/writes, and ~200% faster than play-json" for serializing case classes.

It's easy to use, here's how to serialize a case class to a JSON string:

case class City(name: String, funActivity: String, latitude: Double)
val bengaluru = City("Bengaluru", "South Indian food", 12.97)
implicit val cityRW = upickle.default.macroRW[City]
upickle.default.write(bengaluru) // "{\"name\":\"Bengaluru\",\"funActivity\":\"South Indian food\",\"latitude\":12.97}"

You can also serialize to binary or other formats.

The accepted answer from 2013 proposes a library that is no longer maintained. There are many similar questions on StackOverflow but I really couldn't find a good answer which would meet the following criteria:

  • serialization/ deserialization should be fast
  • high performance data exchange over the wire where you only encode as much metadata as you need
  • supports schema evolution so that changing the serialized object (ex: case class ) doesn't break past deserializations

I recommend against using low-level JDK SerDes (like ByteArrayOutputStream and ByteArrayInputStream ). Supporting schema evolution becomes a pain and it's difficult to make it work with external services (ex: Thrift ) since you have no control if the data being sent back used the same type of streams.

Some people use the JSON spec, using libraries like json4s but it is not suitable for distributed computing message transfer. It marshalls data as a JSON string so it'll be both slower and storage inefficient, since it will use 8 bits to store every character in the string.

I highly recommend using the MessagePack binary serialization format. I would recommend reading the spec to understand the encoding specifics. It has implementations in many different languages, here's a generic example I wrote for a Scala case class that you can copy-paste in your code.

import java.nio.ByteBuffer
import java.util.concurrent.TimeUnit

import org.msgpack.core.MessagePack

case class Data(message: String, number: Long, timeUnit: TimeUnit, price: Long)

object Data extends App {

  def serialize(data: Data): ByteBuffer = {
    val packer = MessagePack.newDefaultBufferPacker
    packer
      .packString(data.message)
      .packLong(data.number)
      .packString(data.timeUnit.toString)
      .packLong(data.price)
    packer.close()
    ByteBuffer.wrap(packer.toByteArray)
  }

  def deserialize(data: ByteBuffer): Data = {
    val unpacker = MessagePack.newDefaultUnpacker(convertDataToByteArray(data))
    val newdata = Data.apply(
      message = unpacker.unpackString(),
      number = unpacker.unpackLong(),
      timeUnit = TimeUnit.valueOf(unpacker.unpackString()),
      price = unpacker.unpackLong()
    )
    unpacker.close()
    newdata
  }

  def convertDataToByteArray(data: ByteBuffer): Array[Byte] = {
    val buffer = Array.ofDim[Byte](data.remaining())
    data.duplicate().get(buffer)
    buffer
  }

  println(deserialize(serialize(Data("Hello world!", 1L, TimeUnit.DAYS, 3L))))
}

It will print:

Data(Hello world!,1,DAYS,3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM