简体   繁体   English

如何将Array [Row]转换为RDD [Row]

[英]How do I convert Array[Row] to RDD[Row]

I have a scenario where I want to convert the result of a dataframe which is in the format Array[Row] to RDD[Row]. 我有一种情况,我想将Array [Row]格式的数据框的结果转换为RDD [Row]。 I have tried using parallelize, but I don't want to use it as it needs to contain entire data in a single system which is not feasible in production box. 我尝试使用并行化,但是我不想使用它,因为它需要将整个数据包含在单个系统中,这在生产环境中是不可行的。

val Bid = spark.sql("select Distinct DeviceId, ButtonName  from stb").collect()
val bidrdd = sparkContext.parallelize(Bid)

How do I achieve this? 我该如何实现? I tried the approach given in this link ( How to convert DataFrame to RDD in Scala? ), but it didn't work for me. 我尝试了此链接中给出的方法( 如何在Scala中将DataFrame转换为RDD? ),但是它对我不起作用。

val bidrdd1 = Bid.map(x => (x(0).toString, x(1).toString)).rdd

It gives an error value rdd is not a member of Array[(String, String)] 它给出一个错误value rdd is not a member of Array[(String, String)]

The variable Bid which you've created here is not a DataFrame, it is an Array[Row] , that's why you can't use .rdd on it. 您在此处创建的变量Bid 不是 DataFrame,而是Array[Row] ,这就是为什么不能在其上使用.rdd的原因。 If you want to get an RDD[Row] , simply call .rdd on the DataFrame (without calling collect ): 如果要获取RDD[Row] ,只需在.rdd上调用.rdd(而无需调用collect ):

val rdd = spark.sql("select Distinct DeviceId, ButtonName  from stb").rdd

Your post contains some misconceptions worth noting: 您的帖子包含一些值得注意的误解:

... a dataframe which is in the format Array[Row] ... ...格式为Array [Row]的数据框...

Not quite - the Array[Row] is the result of collecting the data from the DataFrame into Driver memory - it's not a DataFrame. 不太-的Array[Row]是从数据帧中收集数据转换成驱动器存储器的结果-这不是一个数据帧。

... I don't want to use it as it needs to contain entire data in a single system ... ...我不想使用它,因为它需要在单个系统中包含整个数据...

Note that as soon as you use collect on the DataFrame, you've already collected entire data into a single JVM's memory. 请注意,在DataFrame上使用collect时,您已经将全部数据收集到单个JVM的内存中。 So using parallelize is not the issue. 因此,使用parallelize不是问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM