saveAsTable在Spark-yarn集群环境中失败

Question

I set up a spark-yarn cluster environment, and try spark-SQL with spark-shell: 我设置了一个spark-yarn集群环境，并尝试使用spark-shell进行spark-SQL：

spark-shell --master yarn --deploy-mode client --conf spark.yarn.archive=hdfs://hadoop_273_namenode_ip:namenode_port/spark-archive.zip

One thing to mention is the Spark is in Windows 7. After spark-shell starts up successfully, I execute the commands as below: 要提到的一件事是Spark在Windows 7中。在成功启动spark-shell之后，我执行以下命令：

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val df_mysql_address = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://mysql_db_ip/db").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "ADDRESS").option("user", "root").option("password", "root").load()
scala> df_mysql_address.show
scala> df_mysql_address.write.format("parquet").saveAsTable("address_local")

"show" command returns result-set correctly, but the "saveAsTable" ends in failure. “ show”命令正确返回结果集，但“ saveAsTable”失败。 The error message says: 错误消息显示：

java.io.IOException: Mkdirs failed to create file:/C:/jshen.workspace/programs/spark-2.2.0-bin-hadoop2.7/spark-warehouse/address_local/_temporary/0/_temporary/attempt_20171018104423_0001_m_000000_0 (exists=false, cwd=file:/tmp/hadoop/nm-local-dir/usercache/hduser/appcache/application_1508319604173_0005/container_1508319604173_0005_01_000003)

I expect and guess the table is to be saved in the hadoop cluster, but you can see that the dir (C:/jshen.workspace/programs/spark-2.2.0-bin-hadoop2.7/spark-warehouse) is the folder in my Windows 7, not in hdfs, not even in the hadoop ubuntu machine. 我期望并猜测该表将保存在hadoop集群中，但是您可以看到目录（C：/jshen.workspace/programs/spark-2.2.0-bin-hadoop2.7/spark-warehouse）是Windows 7中的文件夹，而不是hdfs中的文件夹，甚至在hadoop ubuntu机器中也没有。

How could I do? 我该怎么办？ Please advise, thanks. 请指教，谢谢。

Answer 1

The way to get rid of the problem is to provide "path" option prior to "save" operation as shown below: 解决问题的方法是在“保存”操作之前提供“路径”选项，如下所示：

scala> df_mysql_address.write.option("path", "/spark-warehouse").format("parquet").saveAsTable("address_l‌ocal")

Thanks @philantrovert. 谢谢@philantrovert。

saveAsTable在Spark-yarn集群环境中失败

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-10-19 01:20:59

saveAsTable在Spark-yarn集群环境中失败

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-10-19 01:20:59

解决方案1
0 已采纳 2017-10-19 01:20:59