[英]How to create DataFrame from the an array in Scala?
I have a use case where I need to create a DataFrame from an array.我有一个用例,我需要从数组创建一个 DataFrame。
I've created a DataFrame that reads a CSV then I am using a map to process/transform it further.我创建了一个读取 CSV 的 DataFrame,然后我使用地图来进一步处理/转换它。
var mapTransform = df1.collect.map(
line => {
// line.split(",") logic for fields separation
//transformation logic here for various fields
(field1+","+field2+","+field3);
}
)
From this, I am getting an array(Array[String])
which is transformed result.由此,我得到一个
array(Array[String])
,它是转换结果。
I want to further convert it DataFrames with separate columns so that later it can be used to write to DB or file, however, I am facing an issue.我想用单独的列进一步转换它的数据帧,以便以后它可以用于写入数据库或文件,但是,我面临一个问题。 Is it possible to do it?
有可能做到吗? Any solutions?
任何解决方案?
This does your job: spark.sparkContext.parallelize(mapTransform.toSeq)
But note that you must avoid methods that produce non-rdd, as they load all the contents of the array to the one node and that's ineffective in the general case.这可以完成您的工作:
spark.sparkContext.parallelize(mapTransform.toSeq)
但请注意,您必须避免产生非 rdd 的方法,因为它们将数组的所有内容加载到一个节点,这在一般情况下是无效的。
Also, there's a convention turn var
s to val
s as much as possible.此外,还有一个约定是尽可能将
var
s 转换为val
s。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.