[英]Task not serializable when iterating through dataframe, scala
Below is my code and when I try to iterate through each row:下面是我的代码,当我尝试遍历每一行时:
val df: DataFrame = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", true) // Use first line of all files as header
.option("delimiter", TILDE)
.option("inferSchema", "true") // Automatically infer data types
.load(fileName._2)
val accGrpCountsIds: DataFrame = df.groupBy("accgrpid").count()
LOGGER.info(s"DataFrame Count - ${accGrpCountsIds.count()}")
accGrpCountsIds.show(3)
//switch based on file names and update the model.
accGrpCountsIds.foreach(accGrpRow => {
val accGrpId = accGrpRow.getLong(0)
val rowCount = accGrpRow.getInt(1)
}
When I try to interate through the dataframe above using foreach
, I get an task not serializable error.当我尝试使用
foreach
对上面的数据帧进行交互时,我收到一个任务不可序列化错误。 How can I do this?我怎样才能做到这一点?
Do you have any other types in your foreach that you didn't share?您的 foreach 中是否还有其他类型没有共享? or that's all you do and it doesn't work?
或者这就是你所做的一切,但它不起作用?
accGrpCountsIds.foreach(accGrpRow => {
val accGrpId = accGrpRow.getLong(0)
val rowCount = accGrpRow.getInt(1)
}
Also, you may find that useful?另外,你可能会觉得这很有用? Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects
任务不可序列化:java.io.NotSerializableException 仅在类而非对象上调用闭包外的函数时
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.