[英]How to store the result of an action in apache spark using scala
How to store the result generated from an action like: count in an output directory, in apache Spark Scala?如何在 apache Spark Scala 中存储从以下操作生成的结果:在输出目录中计数?
val countval= data.map((_,"")).reduceByKey((_+_)).count
The below command does not work as count is not stored as RDD:以下命令不起作用,因为计数未存储为 RDD:
countval.saveAsTextFile("OUTPUT LOCATION")
Is there any way to store countval into local/hdfs location?有没有办法将 countval 存储到本地/hdfs 位置?
After you call count
it is no longer RDD.在您调用
count
它不再是 RDD。
Count is just Long
and it does not have saveAsTextFile
method. Count 只是
Long
并且它没有saveAsTextFile
方法。
If you want to store your countval
you have to do it like with any other long, string, int...如果您想存储您的
countval
您必须像处理任何其他 long、string、int...
what @szefuf said is correct, after count
you have a Long
which you can save any way you want. @szefuf 说的是正确的,
count
完之后,你就有了一个Long
,你可以用任何你想要的方式保存它。 If you want to save it as an RDD
with .saveAsTextFile()
you have to convert it to an RDD:如果您想使用
.saveAsTextFile()
将其保存为RDD
, .saveAsTextFile()
必须将其转换为 RDD:
sc.parallelize(Seq(countval)).saveAsTextFile("/file/location")
The parallelize
method in SparkContext turns a collection of values into an RDD, so you need to turn the single value to a single-element sequence first. SparkContext 中的
parallelize
方法将一组值转换为 RDD,因此您需要先将单个值转换为单个元素序列。 Then you can save it.然后你可以保存它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.