简体   繁体   English

如何在 Spark Scala 中将 RDD 转换为 DF?

[英]How to convert RDD to DF in spark scala?

I am new in spark.我是火花新手。 And I am trying to convert below RDD to dataframe but not succeed我正在尝试将 RDD 以下转换为数据帧但没有成功

val customerRDD = sc.textFile("file:///home/hduser/data//customer.txt") //custId,CustName,CustEmail,CustPhone //1,ABC,abc@gmail.com,+199240242234 val customerRDD = sc.textFile("file:///home/hduser/data//customer.txt") //custId,CustName,CustEmail,CustPhone //1,ABC,abc@gmail.com,+199240242234

Here I am trying to use customerRDD.toDF() method but not working在这里,我尝试使用 customerRDD.toDF() 方法但不工作

Also I have tried with createDataFrame() method but not able to get the idea我也尝试过 createDataFrame() 方法但无法理解

Does anyone can help How can I convert RDD to DF here?有谁可以帮助我如何在这里将 RDD 转换为 DF?

Thanks谢谢

An odd way of doing things these days, but if you must use an RDD to read a file with a header, then consult this https://sparkbyexamples.com/apache-spark-rdd/spark-load-csv-file-into-rdd/ and note specifically:这些天做事的一种奇怪方式,但如果您必须使用 RDD 来读取带有标题的文件,那么请查阅此https://sparkbyexamples.com/apache-spark-rdd/spark-load-csv-file-into -rdd/并特别注意:

  • Skip the header of each file (can be seen)跳过每个文件的头部(可以看到)
  • Extract the columns yourself via map (can be seen)通过地图自己提取列(可以看到)

Look at this for creating DF from RDD with schema using Structs, see https://sparkbyexamples.com/apache-spark-rdd/convert-spark-rdd-to-dataframe-dataset .查看此内容以使用 Structs 从带有模式的 RDD 创建 DF,请参阅https://sparkbyexamples.com/apache-spark-rdd/convert-spark-rdd-to-dataframe-dataset You can你可以

  • create a schema programmatically for a DF from RDD via createDataFrame()通过createDataFrame()从 RDD 以编程方式为 DF 创建模式
  • or use default schema with implicits或使用带有implicits的默认模式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM