![](/img/trans.png)
[英]Spark driver node and worker node for a Spark application in Standalone cluster
[英]standalone spark: worker didn't show up
我有2個問題想知道:
這是我的代碼:
object Hi {
def main (args: Array[String]) {
println("Sucess")
val conf = new SparkConf().setAppName("HI").setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile("src/main/scala/source.txt")
val rows = textFile.map { line =>
val fields = line.split("::")
(fields(0), fields(1).toInt)
}
val x = rows.map{case (range , ratednum) => range}.collect.mkString("::")
val y = rows.map{case (range , ratednum) => ratednum}.collect.mkString("::")
println(x)
println(y)
println("Sucess2")
}
}
這是一些轉售:
15/04/26 16:49:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/04/26 16:49:57 INFO SparkUI: Started SparkUI at http://192.168.1.105:4040
15/04/26 16:49:57 INFO Executor: Starting executor ID <driver> on host localhost
15/04/26 16:49:57 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.105:64952/user/HeartbeatReceiver
15/04/26 16:49:57 INFO NettyBlockTransferService: Server created on 64954
15/04/26 16:49:57 INFO BlockManagerMaster: Trying to register BlockManager
15/04/26 16:49:57 INFO BlockManagerMasterActor: Registering block manager localhost:64954 with 983.1 MB RAM, BlockManagerId(<driver>, localhost, 64954)
.....
15/04/26 16:49:59 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:839
15/04/26 16:49:59 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MapPartitionsRDD[4] at map at Hi.scala:25)
15/04/26 16:49:59 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/04/26 16:49:59 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1331 bytes)
15/04/26 16:49:59 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/04/26 16:49:59 INFO HadoopRDD: Input split: file:/Users/Winsome/IdeaProjects/untitled/src/main/scala/source.txt:0+23
15/04/26 16:49:59 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1787 bytes result sent to driver
15/04/26 16:49:59 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 13 ms on localhost (1/1)
15/04/26 16:49:59 INFO DAGScheduler: Stage 1 (collect at Hi.scala:25) finished in 0.013 s
15/04/26 16:49:59 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/04/26 16:49:59 INFO DAGScheduler: Job 1 finished: collect at Hi.scala:25, took 0.027784 s
1~1::2~2::3~3
10::20::30
Sucess2
我的第一個問題是:當我檢查http://localhost:8080/
沒有工人。 並且我也無法打開http://192.168.1.105:4040
是因為我獨立使用spark嗎?
如何解決這個問題?
(我的環境是MAC,IDE是Intellij)
我的第二個問題是:
val x = rows.map{case (range , ratednum) => range}.collect.mkString("::")
val y = rows.map{case (range , ratednum) => ratednum}.collect.mkString("::")
println(x)
println(y)
我認為這些代碼可以更容易地獲得x和y(類似這樣的東西: rows[range]
, rows[ratenum]
),但是我對scala並不熟悉。 你能給我一些建議嗎?
我不確定您的第一個問題,但是閱讀您的日志后,我發現工作程序節點持續了13毫秒,因此這可能就是您未看到它的原因。 運行更長的工作,您可能會看到工人。
關於第二個問題,是的,有一種更簡單的編寫方法:
val x = rows.map{(tuple) => tuple._1}.collect.mkString("::")
因為您的RDD
由Tuple
Scala對象組成,而Tuple
Scala對象由兩個字段組成,您可以分別使用_1
和_2
訪問。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.