Bluemix Spark: spark-submit failing when downloading stderr and stdout?

Question

I am using the Spark service in IBM Bluemix. I am trying to launch a Java piece of code for executing some Spark process using the spark-submit.sh script.

My command line is:

./spark-submit.sh --vcap ./VCAP.json --deploy-mode cluster --class org.apache.spark.examples.JavaSparkPi \
--master https://169.54.219.20 ~/Documents/Spark/JavaSparkPi.jar

I am using the latest spark-submit.sh version (as of yesterday).

./spark-submit.sh --version
spark-submit.sh  VERSION : '1.0.0.0.20160330.1'

This worked fine a couple of weeks ago (with the old spark-submit.sh) but now I am getting the following error:

Downloading stdout_1461024849908170118
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0    89    0    89    0     0     56      0 --:--:--  0:00:01 --:--:--   108
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to     stdout_1461024849908170118

Downloading stderr_1461024849908170118
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0    89    0    89    0     0     50      0 --:--:--  0:00:01 --:--:--   108
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stderr to     stderr_1461024849908170118

Any ideas on what am I doing wrong? Thanks in advance.

EDIT:

By looking at the log file I have found that the problem is not really while downloading the stdout and stderr but when submitting the job.

{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FAILED",
  "message" : "Exception from the cluster:
org.apache.spark.SparkException: Failed to change container CWD
org.apache.spark.deploy.master.EgoApplicationManager.egoDriverExitCallback(EgoApplicationManager.scala:168)
org.apache.spark.deploy.master.MasterScheduleDelegatorDriver.onContainerExit(MasterScheduleDelegatorDriver.scala:144)
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.handleActivityFinish(ResourceManagerEGOSlot.scala:555)
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.callbackContainerStateChg(ResourceManagerEGOSlot.scala:525)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:158)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:157)
scala.Option.foreach(Option.scala:236)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$.callbackContainerStateChg(ResourceManager.scala:157)",
  "serverSparkVersion" : "1.6.0",
  "submissionId" : "driver-20160420043532-0027-6e579720-2c9d-428f-b2c7-6613f4845146",
  "success" : true
}
driverStatus is FAILED

EDIT2:

Finally the problem I had when submitting the job has been solved just by creating a brand new instance of the Spark service. My job now executes and finishes after a few seconds.

But I still receive an error when trying to download the stdout and stderr files.

Downloading stdout_1461156506108609180
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
  0    90    0    90    0     0     61      0 --:--:--  0:00:01 --:--:--   125
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180

Downloading stderr_1461156506108609180
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
  0    90    0    90    0     0     56      0 --:--:--  0:00:01 --:--:--   109
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stderr to stderr_1461156506108609180

Any ideas?

Answer 1

I found that the old spark-submit was trying to retrieve stdout and stderr from workdir folder ...

Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to     stdout_1461024849908170118

While the new (downloaded yesterday) spark-submit was trying to download them from workdir2 folder ...

Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180

The folder in use is fixed by variable SS_SPARK_WORK_DIR which is initialized in spark-submit

if [ -z ${SS_SPARK_WORK_DIR} ];  then SS_SPARK_WORK_DIR="workdir2"; fi # Work directory on spark cluster

I changed the value into workdir and everything works now. I have downloaded a new (today) spark-submit from Bluemix site and this problem has been fixed. Now that variable points to workdir.

So, if anything fails, be sure you got the last spark-submit script from Bluemix.

Bluemix Spark: spark-submit failing when downloading stderr and stdout?

Question

1 answers

solution1
0 2016-04-20 15:01:18

Bluemix Spark: spark-submit failing when downloading stderr and stdout?

Question

1 answers

solution1 0 2016-04-20 15:01:18

solution1
0 2016-04-20 15:01:18