简体   繁体   中英

Submitting Job to Remote Apache Spark Server

Apache Spark (v1.6.1) started as service on Ubuntu (10.10.0.102) machine, using ./start-all.sh .

Now need to submit job to this server remotely using Java API.

Following is Java client code running from different machine (10.10.0.95).

    String mySqlConnectionUrl = "jdbc:mysql://localhost:3306/demo?user=sec&password=sec";

    String jars[] = new String[] {"/home/.m2/repository/com/databricks/spark-csv_2.10/1.4.0/spark-csv_2.10-1.4.0.jar", 
            "/home/.m2/repository/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar", 
            "/home/.m2/repository/mysql/mysql-connector-java/6.0.2/mysql-connector-java-6.0.2.jar"};
    SparkConf sparkConf = new SparkConf()
            .setAppName("sparkCSVWriter")
            .setMaster("spark://10.10.0.102:7077")
            .setJars(jars);

    JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);

    SQLContext sqlContext = new SQLContext(javaSparkContext);

    Map<String, String> options = new HashMap<>();
    options.put("driver", "com.mysql.jdbc.Driver");
    options.put("url", mySqlConnectionUrl);
    options.put("dbtable", "(select p.FIRST_NAME from person p) as firstName");

    DataFrame dataFrame = sqlContext.read().format("jdbc").options(options).load();

    dataFrame.write()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .option("delimiter", "|")
        .option("quote", "\"")
        .option("quoteMode", QuoteMode.NON_NUMERIC.toString())
        .option("escape", "\\")
        .save("persons.csv");

    Configuration hadoopConfiguration = javaSparkContext.hadoopConfiguration();
    FileSystem hdfs = FileSystem.get(hadoopConfiguration);

    FileUtil.copyMerge(hdfs, new Path("persons.csv"), hdfs, new Path("\home\persons1.csv"), true, hadoopConfiguration, new String());

As per code need to convert RDBMS data to csv/json using Spark. But when I run this client application, able to connect to remote spark server but in console continuously receiving following WARN message

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 

And at server side on Spark UI in running applications > executor summary > stderr log, received following error.

Exception in thread "main" java.io.IOException: Failed to connect to /192.168.56.1:53112
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.56.1:53112
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more

But there no any IP address configured as 192.168.56.1 . So is there any configuration missing.

Actually my client machine(10.10.0.95) is Windows machine. When I tried to submit Spark job using another Ubuntu machine(10.10.0.155), I am able to run same Java client code successfully.

As I debugged in Windows client environment, when I submit spark job following log displayed,

INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.56.1:61552]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 61552.
INFO MemoryStore: MemoryStore started with capacity 2.4 GB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4044.
INFO SparkUI: Started SparkUI at http://192.168.56.1:4044

As per log line number 2, its register client with 192.168.56.1.

Elsewhere, in Ubuntu client

INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.10.0.155:42786]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 42786.
INFO MemoryStore: MemoryStore started with capacity 511.1 MB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4040.
INFO SparkUI: Started SparkUI at http://10.10.0.155:4040

As per log line number 2, its register client with 10.10.0.155 same as actual IP address.

If anybody find what is the problem with Windows client, let community know.

[UPDATE]

I am running this whole environment in Virtual Box. Windows machine is my host and Ubuntu is guest. And Spark installed in Ubuntu machine. In Virtual box environment Virtual box installing Ethernet adapter VirtualBox Host-Only Netwotk with IPv4 address : 192.168.56.1 . And Spark registering this IP as client IP instead of actual IP address 10.10.0.95 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM