简体   繁体   中英

Spark - jdbc write fails in Yarn cluster mode but works in spark-shell

I am using Spark 1.6.2, Hadoop 2.6, Scala 2.10.5 and Java 1.7

I am using JDBC to read data from MSSQL and this works without any problem:

val hqlContext = new HiveContext(sc)

val url = "jdbc:sqlserver://1.1.1.1:1111;database=CIQOwnershipProcessing;user=OwnershipUser;password=Ownership123"

val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";

val df1 = hqlContext.read.format("jdbc").options(
   Map("url" -> url, "driver" -> driver,
  "dbtable" -> "(select * from OwnershipStandardization_PositionSequence_tbl) as ps")).load()

And, while writing back dataframe to MSSQL, I am using the JDBC write as shown below. This works fine in Spark-shell but fails when I do spark-submit in Yarn-Cluster mode. What am I missing ?

val prop = new java.util.Properties
df1.write.mode("Overwrite").jdbc(url, "CIQOwnershipProcessing.dbo.df_sparkop",prop)

This is how my spark-submit command looks like. As you can see, I am passing the SQLJDBC jar path too. And, I have also specified the jdbc jar path in "spark.executor.extraClassPath" property in spark-defaults.conf on all nodes of the cluster. Since the JDBC read is working, I doubt if it has anything to do with the classpaths.

spark-submit --class com.spgmi.csd.OshpStdCarryOver --master  yarn --deploy-mode cluster --conf spark.yarn.executor.memoryOverhead=2048 --num-executors 1 --executor-cores 2 --driver-memory 3g --executor-memory 8g --jars $SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,/usr/share/java/sqljdbc_4.1/enu/sqljdbc41.jar --files $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/lib/spark-poc2-17.1.0.jar

The error thrown in the Yarn-Cluster mode is:

17/01/05 10:21:31 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.InstantiationException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper java.lang.InstantiationException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper at java.lang.Class.newInstance(Class.java:368) at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:46) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:53) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:278) at com.spgmi.csd.OshpStdCarryOver$.main(SparkOshpStdCarryOver.scala:175) at com.spgmi.csd.OshpStdCarryOver.main(SparkOshpStdCarryOver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces sorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558)

I was facing the same issue. I resolved it by setting connection property in prop.

val prop = new java.util.Properties
prop.setProperty("driver","com.mysql.jdbc.Driver")

now pass this prop in

df1.write.mode("Overwrite").jdbc(url, "CIQOwnershipProcessing.dbo.df_sparkop",prop)

Your problem feels very similar to SPARK-14204 and SPARK-14162 -- although that bug was supposed to be fixed in Spark 1.6.2 (?!)

With a Type 4 JDBC driver you should not have to explicitly mention the "driver" property; the JAR should automatically register the URL prefix that it supports (here jdbc:sqlserver: ).
But because of the bug, the Spark JDBC module may not use that "registration" to find the driver that implicitly matches the URL.

In other words: for reading, you force the "driver" property and the connection works; for writing, you don't force it, and it does not work. Aha!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM