Scala文件中的RDD处理

Question

I have loaded the 2 csv files,converted RDD to DF, and I have written some JOIN conditions to perform on them. 我已经加载了2个csv文件，将RDD转换为DF，并且我编写了一些JOIN条件以对其执行。 I have used spark shell for these. 我已经用火花壳了。 Now I want to put/bundle all these commands in a .scala file and run through spark-submit job. 现在，我想将所有这些命令放入/打包到.scala文件中，并通过spark-submit作业运行。 Currently I am not using any IDE for this and want to run from terminal. 目前，我没有为此使用任何IDE，而是想从终端运行。 Do I need to have main method for this? 我是否需要主要方法？ If yes kindly suggest how can I proceed with this? 如果是，请提示我该如何进行？

Thanks much for your time and inputs. 非常感谢您的时间和投入。

Answer 1

You don't need main method to run Scala script in spark shell. 您不需要main方法来在spark shell中运行Scala脚本。

1.Write your all the steps in a file and save as file.scala 1.将所有步骤写到文件中并另存为file.scala

2.Run the spark shell like spark-shell -i C:\\spark\\file.scala 2.像spark-shell -i C：\\ spark \\ file.scala一样运行spark shell

Below is the sample code I wrote in file.scala 以下是我在file.scala中编写的示例代码

val rdd=sc.textFile("C:\\Users\\manoj kumar dhakad\\Desktop\\scores.txt")
rdd.collect.foreach(println)

Below is the way in which I submitted it. 以下是我提交的方式。

spark-shell -i C:\spark\file.scala

Sample output 样品输出

rdd: org.apache.spark.rdd.RDD[String] = C:\Users\manoj kumar dhakad\Desktop\scores.txt MapPartitionsRDD[1] at textFile at <console>:24
75,89,150,135,200,76,12,100,150,28,178,189,167,200,175,150,87,99,129,149,176,200,87,35,157,189

Scala文件中的RDD处理

问题描述

1 个解决方案

解决方案1
0 2018-06-05 19:08:34

Scala文件中的RDD处理

问题描述

1 个解决方案

解决方案1 0 2018-06-05 19:08:34

解决方案1
0 2018-06-05 19:08:34