[英]Dataframe to RDD piece of code is not working
I am trying to read each row of dataframe and convert the row data into custom bean class.我正在尝试读取 dataframe 的每一行并将行数据转换为自定义 bean class。 But the problem here is, the code is not getting executed.
但这里的问题是,代码没有被执行。 To check, I have written multiple print statement but none of the print statement present inside
df.rdd.map{row=>}
executed as if the complete block of code is escaped.为了检查,我编写了多个打印语句,但
df.rdd.map{row=>}
中没有任何打印语句被执行,就好像整个代码块被转义一样。
code snippet:代码片段:
print("data frame:", df.show()).
df.rdd.map(row => {
// Debugging
println("Debugging")
if(row.isNullAt(0)) {
println("null data")
} else {
println(row.get(0).toString)
}
val employeeJobData = new EmployeeJobData
if(row.get(0).toString == null || row.get(0).toString.isEmpty){
employeeJobData.setEmployeeId("NULL_KEY_VALUE")
} else {
employeeJobData.setEmployeeId(row.get(0).toString)
}
employeeJobDataList.add(employeeJobData)
} )
output of df.show()
: df.show
df.show()
的 output :
|employee_id|employee_name|employee_email|paygroup|level|dept_id|
+-----------+-------------+--------------+--------+-----+-------+
|13 | null| null| null| null| null|
|14 | null| null| null| null| null|
|15 | null| null| null| null| null|
|16 | null| null| null| null| null|
|17 | null| null| null| null| null|
+-----------+-------------+--------------+--------+-----+-------+
You can remove unnecessary code as below and get java.util.List[EmployeeJobData]
as below您可以删除不必要的代码如下并获得
java.util.List[EmployeeJobData]
如下
import java.util
object MapToCaseClass {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess;
import spark.implicits._
val df = List((12,"name","email@email.com","paygroup","level","dept_id")).toDF()
val employeeList : util.List[EmployeeJobData] = df
.map(row => {
val id = if (null == row.getString(0) || "null".equals(row.getString(0)) || row.getString(0).trim.isEmpty) {
"NULL_KEY_VALUE"
} else {
row.getString(0)
}
EmployeeJobData(id, row.getString(1), row.getString(2),
row.getString(3), row.getString(4), row.getString(5))
})
.collectAsList
}
}
case class EmployeeJobData(employee_id: String, employee_name: String,employee_email: String,paygroup: String,
level: String,dept_id: String)
The above can be improved more by just setting the data type of employee_id
and dept_id
(ie if its numeric) to Long
.只需将
employee_id
和dept_id
的数据类型(即如果它的数字)设置为Long
,就可以进一步改进上述内容。 This "null".equals
and .isEmpty()
can be avoided for employee_id
and code can be further reduced.对于
employee_id
,可以避免这个"null".equals
和.isEmpty()
,并且可以进一步减少代码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.