简体   繁体   English

从外部在Hortonworks Sandbox上执行Spark作业

[英]Execute Spark job on Hortonworks Sandbox from outside

I'm running Hortonworks Sandbox as a virtual machine using VirtualBox. 我正在使用VirtualBox将Hortonworks Sandbox作为虚拟机运行。

Using an IDE in my local machine (IntelliJ Idea), I try to execute a Spark Job at the sandbox virtual machine from my local machine but without success. 使用本地计算机上的IDE(IntelliJ Idea),我尝试从本地计算机在沙盒虚拟机上执行Spark Job,但未成功。

This is the spark job code: 这是spark作业代码:

import org.apache.spark.{SparkConf, SparkContext}

object HelloWorld {

  def main(args: Array[String]): Unit = {
    val logFile = "file:///tmp/words.txt" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://127.0.0.1:4040")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }

}

The error logs I'm getting from the execution are: 我从执行中得到的错误日志是:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/04 13:16:50 INFO SparkContext: Running Spark version 2.2.0
18/04/04 13:16:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/04 13:16:50 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
    ...
18/04/04 13:16:50 INFO SparkContext: Submitted application: Simple Application
18/04/04 13:16:50 INFO SecurityManager: Changing view acls to: jaramos
18/04/04 13:16:50 INFO SecurityManager: Changing modify acls to: jaramos
18/04/04 13:16:50 INFO SecurityManager: Changing view acls groups to: 
18/04/04 13:16:50 INFO SecurityManager: Changing modify acls groups to: 
18/04/04 13:16:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jaramos); groups with view permissions: Set(); users  with modify permissions: Set(jaramos); groups with modify permissions: Set()
18/04/04 13:16:51 INFO Utils: Successfully started service 'sparkDriver' on port 54849.
18/04/04 13:16:51 INFO SparkEnv: Registering MapOutputTracker
18/04/04 13:16:51 INFO SparkEnv: Registering BlockManagerMaster
18/04/04 13:16:51 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/04/04 13:16:51 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/04/04 13:16:51 INFO DiskBlockManager: Created local directory at C:\Users\jaramos\AppData\Local\Temp\blockmgr-93e05db6-a65a-4a3f-b238-9cde5d918bc2
18/04/04 13:16:51 INFO MemoryStore: MemoryStore started with capacity 1986.6 MB
18/04/04 13:16:51 INFO SparkEnv: Registering OutputCommitCoordinator
18/04/04 13:16:51 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/04/04 13:16:51 INFO Utils: Successfully started service 'SparkUI' on port 4041.
18/04/04 13:16:51 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.75.1:4041
18/04/04 13:16:52 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:4040...
18/04/04 13:16:52 INFO TransportClientFactory: Successfully created connection to /127.0.0.1:4040 after 25 ms (0 ms spent in bootstraps)
18/04/04 13:16:52 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /127.0.0.1:4040 is closed
18/04/04 13:16:52 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 127.0.0.1:4040
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    ...
Caused by: java.io.IOException: Connection from /127.0.0.1:4040 closed
    at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146)
    ...
18/04/04 13:17:12 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:4040...
18/04/04 13:17:12 INFO TransportClientFactory: Found inactive connection to /127.0.0.1:4040, creating a new one.
18/04/04 13:17:12 INFO TransportClientFactory: Successfully created connection to /127.0.0.1:4040 after 2 ms (0 ms spent in bootstraps)
18/04/04 13:17:12 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /127.0.0.1:4040 is closed
18/04/04 13:17:12 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 127.0.0.1:4040
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    ...
Caused by: java.io.IOException: Connection from /127.0.0.1:4040 closed
    at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146)
    ...
18/04/04 13:17:32 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:4040...
18/04/04 13:17:32 INFO TransportClientFactory: Found inactive connection to /127.0.0.1:4040, creating a new one.
18/04/04 13:17:32 INFO TransportClientFactory: Successfully created connection to /127.0.0.1:4040 after 1 ms (0 ms spent in bootstraps)
18/04/04 13:17:32 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /127.0.0.1:4040 is closed
18/04/04 13:17:32 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 127.0.0.1:4040
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    ...
Caused by: java.io.IOException: Connection from /127.0.0.1:4040 closed
    at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146)
    ...
18/04/04 13:17:52 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
18/04/04 13:17:52 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
18/04/04 13:17:52 INFO SparkUI: Stopped Spark web UI at http://10.0.75.1:4041
18/04/04 13:17:52 INFO StandaloneSchedulerBackend: Shutting down all executors
18/04/04 13:17:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
18/04/04 13:17:52 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54923.
18/04/04 13:17:52 INFO NettyBlockTransferService: Server created on 10.0.75.1:54923
18/04/04 13:17:52 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
18/04/04 13:17:52 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/04/04 13:17:52 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.75.1, 54923, None)
18/04/04 13:17:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/04 13:17:52 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.75.1:54923 with 1986.6 MB RAM, BlockManagerId(driver, 10.0.75.1, 54923, None)
18/04/04 13:17:52 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.75.1, 54923, None)
18/04/04 13:17:52 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.75.1, 54923, None)
18/04/04 13:17:52 INFO MemoryStore: MemoryStore cleared
18/04/04 13:17:52 INFO BlockManager: BlockManager stopped
18/04/04 13:17:52 INFO BlockManagerMaster: BlockManagerMaster stopped
18/04/04 13:17:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/04/04 13:17:52 INFO SparkContext: Successfully stopped SparkContext
18/04/04 13:17:52 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
    at HelloWorld$.main(HelloWorld.scala:8)
    at HelloWorld.main(HelloWorld.scala)
18/04/04 13:17:52 INFO SparkContext: SparkContext already stopped.
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
    at HelloWorld$.main(HelloWorld.scala:8)
    at HelloWorld.main(HelloWorld.scala)
18/04/04 13:17:52 INFO ShutdownHookManager: Shutdown hook called
18/04/04 13:17:52 INFO ShutdownHookManager: Deleting directory C:\Users\jaramos\AppData\Local\Temp\spark-0e2461c0-f3fa-402b-8fa9-d4e3ede388d1

How can I connect to the remote Spark machine? 如何连接到远程Spark机器?

Thanks in advance! 提前致谢!

Use port mapping to expose all the relevant hadoop and environment component ports. 使用端口映射可以公开所有相关的hadoop和环境组件端口。 For example 9083 for hive metastore. 例如,配置单元metastore的9083。 then copy your hive-site.xml and hdfs-site.xml to your intellij resource directory. 然后将您的hive-site.xml和hdfs-site.xml复制到您的intellij资源目录。 It should work then 那应该工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM