简体   繁体   English

使用LocalDateTime的Spark序列化错误

[英]Spark serialization error using LocalDateTime

Code - 代码-

val rdd=sc.textFile("/tmp/abc.csv")
rdd.first.split(",").zipWithIndex
val rows=rdd.filter(x => !x.contains("ID") && !x.contains("Case Number"))
val split1=rows.map(x => x.split(","))
split1.take(3)
import java.time._
import java.time.format._
val format=DateTimeFormatter.ofPattern("MM/dd/yyyy h:m:s a")
val dates=split1.map( x => LocalDateTime.parse( x(2) , format))

Error: 错误:

org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) org.apache.spark.SparkException:无法在org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:304)上序列化的任务

Rather ugly way to handle this is to push format initialization inside anonymous function: 解决这个问题的比较丑陋的方法是在匿名函数中推送格式初始化:

split1.map(x => 
  LocalDateTime.parse(x(2), DateTimeFormatter.ofPattern("MM/dd/yyyy h:m:s a")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM