简体   繁体   English

Spark无法找到JDBC驱动程序

[英]Spark Unable to find JDBC Driver

So I've been using sbt with assembly to package all my dependencies into a single jar for my spark jobs. 所以我一直在使用sbt with assembly将我的所有依赖项打包成一个jar用于我的spark作业。 I've got several jobs where I was using c3p0 to setup connection pool information, broadcast that out, and then use foreachPartition on the RDD to then grab a connection, and insert the data into the database. 我有几个工作,我使用c3p0设置连接池信息,广播出来,然后在RDD上使用foreachPartition然后获取连接,并将数据插入数据库。 In my sbt build script, I include 在我的sbt构建脚本中,我包含了

"mysql" % "mysql-connector-java" % "5.1.33"

This makes sure the JDBC connector is packaged up with the job. 这可确保JDBC连接器与作业打包在一起。 Everything works great. 一切都很好。

So recently I started playing around with SparkSQL and realized it's much easier to simply take a dataframe and save it to a jdbc source with the new features in 1.3.0 所以最近我开始玩SparkSQL,并意识到简单地采用数据帧并将其保存到具有1.3.0新功能的jdbc源更容易

I'm getting the following exception : 我收到以下异常:

java.sql.SQLException: No suitable driver found for jdbc:mysql://some.domain.com/myschema?user=user&password=password at java.sql.DriverManager.getConnection(DriverManager.java:596) at java.sql.DriverManager.getConnection(DriverManager.java:233) java.sql.SQLException:在java.sql中找不到合适的jdbc驱动程序:mysql://some.domain.com/myschema?user = user&password = password at java.sql.DriverManager.getConnection(DriverManager.java:596)。的DriverManager.getConnection(DriverManager.java:233)

When I was running this locally I got around it by setting 当我在本地运行时,我通过设置绕过它

SPARK_CLASSPATH=/path/where/mysql-connector-is.jar

Ultimately what I'm wanting to know is, why is the job not capable of finding the driver when it should be packaged up with it? 最终我想知道的是,为什么这个工作不能找到驱动程序什么时候应该打包它呢? My other jobs never had this problem. 我的其他工作从未遇到过这个问题。 From what I can tell both c3p0 and the dataframe code both make use of the java.sql.DriverManager (which handles importing everything for you from what I can tell) so it should work just fine?? 从我可以告诉c3p0和数据帧代码两者都使用java.sql.DriverManager (它从我可以告诉你处理导入一切)所以它应该工作得很好? If there is something that prevents the assembly method from working, what do I need to do to make this work? 如果有什么东西阻止汇编方法工作,我需要做些什么来使其工作?

This person was having similar issue: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-DataFrame-with-MySQL-td22178.html 这个人有类似的问题: http//apache-spark-user-list.1001560.n3.nabble.com/How-to-use-DataFrame-with-MySQL-td22178.html

Have you updated your connector drivers to the most recent version? 您是否已将连接器驱动程序更新到最新版本? Also did you specify the driver class when you called load()? 你在调用load()时也指定了驱动程序类吗?

Map<String, String> options = new HashMap<String, String>();
options.put("url", "jdbc:mysql://localhost:3306/video_rcmd?user=root&password=123456");
options.put("dbtable", "video");
options.put("driver", "com.mysql.cj.jdbc.Driver"); //here
DataFrame jdbcDF = sqlContext.load("jdbc", options); 

In spark/conf/spark-defaults.conf, you can also set spark.driver.extraClassPath and spark.executor.extraClassPath to the path of your MySql driver .jar 在spark / conf / spark-defaults.conf中,您还可以将spark.driver.extraClassPath和spark.executor.extraClassPath设置为MySql驱动程序的路径.jar

Both spark driver and executor need mysql driver on class path so specify 火花驱动程序和执行程序都需要类路径上的mysql驱动程序,所以指定

spark.driver.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
spark.executor.extraClassPath = <path>/mysql-connector-java-5.1.36.jar

These options are clearly mentioned in spark docs : --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar spark文档中明确提到了这些选项: - --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

The mistake I was doing was mentioning these options after my application's jar. 我正在做的错误是在我的应用程序的jar之后提到这些选项。

However the correct way is to specify these options immediately after spark-submit: 但是正确的方法是在spark-submit之后立即指定这些选项:

spark-submit --driver-class-path /somepath/project/mysql-connector-java-5.1.30-bin.jar --jars /somepath/project/mysql-connector-java-5.1.30-bin.jar --class com.package.MyClass target/scala-2.11/project_2.11-1.0.jar

With spark 2.2.0, problem was corrected for me by adding extra class path information for SparkSession session in python script : 使用spark 2.2.0,通过在python脚本中为SparkSession会话添加额外的类路径信息来解决问题:

    spark = SparkSession \
        .builder \
        .appName("Python Spark SQL basic example") \
        .config("spark.driver.extraClassPath", "/path/to/jdbc/driver/postgresql-42.1.4.jar") \
        .getOrCreate()

See official documentation https://spark.apache.org/docs/latest/configuration.html 请参阅官方文档https://spark.apache.org/docs/latest/configuration.html

In my case, spark is not launched from cli command, but from django framework https://www.djangoproject.com/ 就我而言,spark不是从cli命令启动的,而是来自django框架https://www.djangoproject.com/

spark.driver.extraClassPath does not work in client-mode: spark.driver.extraClassPath在客户端模式下不起作用:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. 注意:在客户端模式下,不能直接在应用程序中通过SparkConf设置此配置,因为驱动程序JVM已在此时启动。 Instead, please set this through the --driver-class-path command line option or in your default properties file. 相反,请通过--driver-class-path命令行选项或在默认属性文件中设置它。

Env variable SPARK_CLASSPATH has been deprecated in Spark 1.0+. 在Spark 1.0+中已弃用Env变量SPARK_CLASSPATH。

You should first copy the jdbc driver jars into each executor under the same local filesystem path and then use the following options in you spark-submit: 您应该首先将jdbc驱动程序jar复制到同一本地文件系统路径下的每个执行程序中,然后在spark-submit中使用以下选项:

--driver-class-path "driver_local_file_system_jdbc_driver1.jar:driver_local_file_system_jdbc_driver2.jar"
--class "spark.executor.extraClassPath=executors_local_file_system_jdbc_driver1.jar:executors_local_file_system_jdbc_driver2.jar"

For example in case of TeraData you need both terajdbc4.jar and tdgssconfig.jar . 例如,在TeraData的情况下,您需要terajdbc4.jar和tdgssconfig.jar。

Alternatively modify compute_classpath.sh on all worker nodes, Spark documentation says: 或者在所有工作节点上修改compute_classpath.sh,Spark文档说:

The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. JDBC驱动程序类必须对客户端会话和所有执行程序上的原始类加载器可见。 This is because Java's DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. 这是因为Java的DriverManager类进行了安全检查,导致它忽略了当打开连接时原始类加载器不可见的所有驱动程序。 One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs. 一种方便的方法是修改所有工作节点上的compute_classpath.sh以包含驱动程序JAR。

There exists a simple Java trick to solve your problem. 存在一个简单的Java技巧来解决您的问题。 You should specify Class.forName() instance. 您应该指定Class.forName()实例。 For example: 例如:

 val customers: RDD[(Int, String)] = new JdbcRDD(sc, () => {
       Class.forName("com.mysql.jdbc.Driver")
       DriverManager.getConnection(jdbcUrl)
      },
      "SELECT id, name from customer WHERE ? < id and id <= ?" ,
      0, range, partitions, r => (r.getInt(1), r.getString(2)))

Check the docs 检查文档

I had the same problem running jobs over a Mesos cluster in cluster mode. 我在群集模式下通过Mesos群集运行作业时遇到同样的问题。

To use a JDBC driver is necessary to add the dependency to the system classpath not to the framework classpath. 要将JDBC驱动程序添加到系统类路径而不是框架类路径,必须使用JDBC驱动程序。 I only found the way of doing it by adding the dependency in the file spark-defaults.conf in every instance of the cluster. 我只是通过在集群的每个实例中的文件spark-defaults.conf中添加依赖项来找到它的方法。

The properties to add are spark.driver.extraClassPath and spark.executor.extraClassPath and the path must be in the local file system. 要添加的属性是spark.driver.extraClassPathspark.executor.extraClassPath ,并且路径必须位于本地文件系统中。

我将jar文件添加到spark-env.sh中的SPARK_CLASSPATH,它可以工作。

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/local/spark-1.6.3-bin-hadoop2.6/lib/mysql-connector-java-5.1.40-bin.jar

简单的方法是将“ mysql-connector-java-5.1.47.jar ”复制到“spark-2.4.3 \\ jars \\”目录中

I was facing the same issue when I was trying to run the spark-shell command from my windows machine. 当我试图从我的Windows机器上运行spark-shell命令时,我遇到了同样的问题。 The path that you pass for the driver location as well as for the jar that you would be using should be in the double quotes otherwise it gets misinterpreted and you would not get the exact output that you want. 您为驱动程序位置以及将要使用的jar传递的路径应该是双引号,否则会被误解,您将无法获得所需的确切输出。

you also would have to install the JDBC driver for SQL server from the link : JDBC Driver 您还必须从以下链接安装SQL Server的JDBC驱动程序JDBC Driver

I have used the below command for this to work fine for me on my windows machine: 我已经使用下面的命令在我的Windows机器上为我工作正常:

spark-shell --driver-class-path "C:\\Program Files\\Microsoft JDBC Driver 6.0 for SQL Server\\sqljdbc_6.0\\enu\\jre8\\sqljdbc42.jar" --jars "C:\\Program Files\\Microsoft JDBC Driver 6.0 for SQL Server\\sqljdbc_6.0\\enu\\jre8\\sqljdbc42.jar" spark-shell --driver-class-path“C:\\ Program Files \\ Microsoft JDBC Driver 6.0 for SQL Server \\ sqljdbc_6.0 \\ enu \\ jre8 \\ sqljdbc42.jar”--jars“C:\\ Program Files \\ Microsoft JDBC Driver 6.0 for SQL Server \\ sqljdbc_6.0 \\ enu \\ jre8 \\ sqljdbc42.jar“

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM