简体   繁体   English

如何使用scala在apache spark中存储操作的结果

[英]How to store the result of an action in apache spark using scala

How to store the result generated from an action like: count in an output directory, in apache Spark Scala?如何在 apache Spark Scala 中存储从以下操作生成的结果:在输出目录中计数?

    val countval= data.map((_,"")).reduceByKey((_+_)).count

The below command does not work as count is not stored as RDD:以下命令不起作用,因为计数未存储为 RDD:

    countval.saveAsTextFile("OUTPUT LOCATION")

Is there any way to store countval into local/hdfs location?有没有办法将 countval 存储到本地/hdfs 位置?

After you call count it is no longer RDD.在您调用count它不再是 RDD。

Count is just Long and it does not have saveAsTextFile method. Count 只是Long并且它没有saveAsTextFile方法。

If you want to store your countval you have to do it like with any other long, string, int...如果您想存储您的countval您必须像处理任何其他 long、string、int...

what @szefuf said is correct, after count you have a Long which you can save any way you want. @szefuf 说的是正确的, count完之后,你就有了一个Long ,你可以用任何你想要的方式保存它。 If you want to save it as an RDD with .saveAsTextFile() you have to convert it to an RDD:如果您想使用.saveAsTextFile()将其保存为RDD.saveAsTextFile()必须将其转换为 RDD:

 sc.parallelize(Seq(countval)).saveAsTextFile("/file/location")

The parallelize method in SparkContext turns a collection of values into an RDD, so you need to turn the single value to a single-element sequence first. SparkContext 中的parallelize方法将一组值转换为 RDD,因此您需要先将单个值转换为单个元素序列。 Then you can save it.然后你可以保存它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM