简体   繁体   English

如何从 Scala 中的数组创建 DataFrame?

[英]How to create DataFrame from the an array in Scala?

I have a use case where I need to create a DataFrame from an array.我有一个用例,我需要从数组创建一个 DataFrame。

I've created a DataFrame that reads a CSV then I am using a map to process/transform it further.我创建了一个读取 CSV 的 DataFrame,然后我使用地图来进一步处理/转换它。

    var mapTransform = df1.collect.map( 
      line => {
      // line.split(",") logic for fields separation
      //transformation logic here for various fields

      (field1+","+field2+","+field3);  
      }
    )

From this, I am getting an array(Array[String]) which is transformed result.由此,我得到一个array(Array[String]) ,它是转换结果。

I want to further convert it DataFrames with separate columns so that later it can be used to write to DB or file, however, I am facing an issue.我想用单独的列进一步转换它的数据帧,以便以后它可以用于写入数据库或文件,但是,我面临一个问题。 Is it possible to do it?有可能做到吗? Any solutions?任何解决方案?

This does your job: spark.sparkContext.parallelize(mapTransform.toSeq) But note that you must avoid methods that produce non-rdd, as they load all the contents of the array to the one node and that's ineffective in the general case.这可以完成您的工作: spark.sparkContext.parallelize(mapTransform.toSeq)但请注意,您必须避免产生非 rdd 的方法,因为它们将数组的所有内容加载到一个节点,这在一般情况下是无效的。

Also, there's a convention turn var s to val s as much as possible.此外,还有一个约定是尽可能将var s 转换为val s。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM