Dataframe 到 RDD 的一段代码不起作用

Question

I am trying to read each row of dataframe and convert the row data into custom bean class.我正在尝试读取 dataframe 的每一行并将行数据转换为自定义 bean class。 But the problem here is, the code is not getting executed.但这里的问题是，代码没有被执行。 To check, I have written multiple print statement but none of the print statement present inside df.rdd.map{row=>} executed as if the complete block of code is escaped.为了检查，我编写了多个打印语句，但df.rdd.map{row=>}中没有任何打印语句被执行，就好像整个代码块被转义一样。

code snippet:代码片段：

 print("data frame:", df.show()). 

 df.rdd.map(row => {
   // Debugging
   println("Debugging")

  if(row.isNullAt(0)) {
    println("null data")
  } else {
    println(row.get(0).toString)
  }

  val employeeJobData = new EmployeeJobData

  if(row.get(0).toString == null || row.get(0).toString.isEmpty){
    employeeJobData.setEmployeeId("NULL_KEY_VALUE")
  } else {
    employeeJobData.setEmployeeId(row.get(0).toString)
  }
  employeeJobDataList.add(employeeJobData)
  } )

output of df.show() : df.show df.show()的 output ：

   |employee_id|employee_name|employee_email|paygroup|level|dept_id|
   +-----------+-------------+--------------+--------+-----+-------+
   |13         |         null|          null|    null| null|   null|
   |14         |         null|          null|    null| null|   null|
   |15         |         null|          null|    null| null|   null|
   |16         |         null|          null|    null| null|   null|
   |17         |         null|          null|    null| null|   null|
   +-----------+-------------+--------------+--------+-----+-------+

Answer 1

You can remove unnecessary code as below and get java.util.List[EmployeeJobData] as below您可以删除不必要的代码如下并获得java.util.List[EmployeeJobData]如下

import java.util

object MapToCaseClass {

  def main(args: Array[String]): Unit = {
    val spark = Constant.getSparkSess;

    import spark.implicits._

    val df  = List((12,"name","email@email.com","paygroup","level","dept_id")).toDF()
    val employeeList : util.List[EmployeeJobData] = df
      .map(row => {
        val id = if (null == row.getString(0) || "null".equals(row.getString(0)) || row.getString(0).trim.isEmpty) {
          "NULL_KEY_VALUE"
        } else {
          row.getString(0)
        }
        EmployeeJobData(id, row.getString(1), row.getString(2),
          row.getString(3), row.getString(4), row.getString(5))
      })
      .collectAsList
  }

}

case class EmployeeJobData(employee_id: String, employee_name: String,employee_email: String,paygroup: String,
                           level: String,dept_id: String)

The above can be improved more by just setting the data type of employee_id and dept_id (ie if its numeric) to Long .只需将employee_id和dept_id的数据类型（即如果它的数字）设置为Long ，就可以进一步改进上述内容。 This "null".equals and .isEmpty() can be avoided for employee_id and code can be further reduced.对于employee_id ，可以避免这个"null".equals和.isEmpty() ，并且可以进一步减少代码。

Dataframe 到 RDD 的一段代码不起作用

问题描述

1 个解决方案

解决方案1
0 2020-05-02 14:44:13

Dataframe 到 RDD 的一段代码不起作用

问题描述

1 个解决方案

解决方案1 0 2020-05-02 14:44:13

解决方案1
0 2020-05-02 14:44:13