简体   繁体   English

Scala文件中的RDD处理

[英]RDD processing in scala file

I have loaded the 2 csv files,converted RDD to DF, and I have written some JOIN conditions to perform on them. 我已经加载了2个csv文件,将RDD转换为DF,并且我编写了一些JOIN条件以对其执行。 I have used spark shell for these. 我已经用火花壳了。 Now I want to put/bundle all these commands in a .scala file and run through spark-submit job. 现在,我想将所有这些命令放入/打包到.scala文件中,并通过spark-submit作业运行。 Currently I am not using any IDE for this and want to run from terminal. 目前,我没有为此使用任何IDE,而是想从终端运行。 Do I need to have main method for this? 我是否需要主要方法? If yes kindly suggest how can I proceed with this? 如果是,请提示我该如何进行?

Thanks much for your time and inputs. 非常感谢您的时间和投入。

You don't need main method to run Scala script in spark shell. 您不需要main方法来在spark shell中运行Scala脚本。

1.Write your all the steps in a file and save as file.scala 1.将所有步骤写到文件中并另存为file.scala

2.Run the spark shell like spark-shell -i C:\\spark\\file.scala 2.像spark-shell -i C:\\ spark \\ file.scala一样运行spark shell

Below is the sample code I wrote in file.scala 以下是我在file.scala中编写的示例代码

val rdd=sc.textFile("C:\\Users\\manoj kumar dhakad\\Desktop\\scores.txt")
rdd.collect.foreach(println)

Below is the way in which I submitted it. 以下是我提交的方式。

spark-shell -i C:\spark\file.scala

Sample output 样品输出

rdd: org.apache.spark.rdd.RDD[String] = C:\Users\manoj kumar dhakad\Desktop\scores.txt MapPartitionsRDD[1] at textFile at <console>:24
75,89,150,135,200,76,12,100,150,28,178,189,167,200,175,150,87,99,129,149,176,200,87,35,157,189

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM