[英]Scala : Product with Serializable does not take parameters
My objectif is to read Data from a csv file and convert my rdd to dataframe in scala/spark. 我的目标是从csv文件读取数据并将rdd转换为scala / spark中的数据帧。 This is my code :
这是我的代码:
package xxx.DataScience.CompensationStudy
import org.apache.spark._
import org.apache.log4j._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.{col, udf}
import org.apache.spark.sql.types._
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
object CompensationAnalysis {
case class GetDF(profil_date:String, profil_pays:String, param_tarif2:String, param_tarif3:String, dt_titre:String, dt_langues:String,
dt_diplomes:String, dt_experience:String, dt_formation:String, dt_outils:String, comp_applications:String,
comp_interventions:String, comp_competence:String)
def main(args: Array[String]) {
Logger.getLogger("org").setLevel(Level.ERROR)
val conf = new SparkConf().setAppName("CompensationAnalysis ")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val lines = sc.textFile("C:/Users/../Downloads/CompensationStudy.csv").flatMap { l =>
l.split(",") match {
case field: Array[String] if field.size > 13 => Some(field(0), field(1), field(2), field(3), field(4), field(5), field(6), field(7), field(8), field(9), field(10), field(11), field(12))
case field: Array[String] if field.size == 1 => Some((field(0), "default value"))
case _ => None
}
}
At this stade, I had the error : Product with Serializable does not take parameters 在此情况下,我遇到了错误: 具有可序列化的产品未接受参数
val summary = lines.collect().map(x => GetDF(x("profil_date"), x("profil_pays"), x("param_tarif2"), x("param_tarif3"), x("dt_titre"), x("dt_langues"), x("dt_diplomes"), x("dt_experience"), x("dt_formation"), x("dt_outils"), x("comp_applications"), x("comp_interventions"), x("comp_competence")))
val sum_df = summary.toDF()
df.printSchema
}
}
This is a screenshot : 这是截图:
Help please ? 请帮助 ?
You have several things you should improve. 您有几件事需要改进。 The most urgent problem, which causes the exception, is, as @CyrilleCorpet points out, " the three different lines in the pattern matching return values of types
Some[Tuple13]
, Some[Tuple2]
and None.type
. The least-upper-bound is then Option[Product with Serializable]
which complies with flatMap
's signature (where the result should be an Iterable[T]
) modulo some implicit conversion." @CyrilleCorpet指出,引起异常的最紧迫的问题是:“模式中的三个不同行匹配类型
Some[Tuple13]
, Some[Tuple2]
和None.type
返回值。然后绑定Option[Product with Serializable]
,该Option[Product with Serializable]
符合flatMap
的签名(结果应为Iterable[T]
),并对某些隐式转换进行模运算。”
Basically, if you had Some[Tuple13]
, Some[Tuple13]
, and None
or Some[Tuple2]
, Some[Tuple2]
, and None
, you would be better off. 基本上,如果您有
Some[Tuple13]
, Some[Tuple13]
和None
或 Some[Tuple2]
, Some[Tuple2]
和None
,您会更好。
Also, pattern matching on types is generally a bad idea because of type erasure, and pattern matching isn't even great anyway for your situation. 另外,由于类型擦除,通常在类型上进行模式匹配不是一个好主意,而且对于您的情况而言,模式匹配甚至都不是很好。
So you could set default values in your case class: 因此,您可以在案例类中设置默认值:
case class GetDF(profile_date: String,
profile_pays: String = "default",
param_tarif2: String = "default",
...
)
Then in your lambda: 然后在您的lambda中:
val tokens = l.split
if (l.length > 13) {
Some(GetDf(l(0), l(1), l(2)...))
} else if (l.length == 1) {
Some(GetDf(l(0)))
} else {
None
}
Now in all cases you are returning Option[GetDF]
. 现在,在所有情况下,您都将返回
Option[GetDF]
。 You can flatMap
the RDD
to get rid of all the None
s and keep only GetDF
instances. 您可以
flatMap
的RDD
摆脱所有的None
S和只保留GetDF
实例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.