如何将 Spark 连接到 Zeppelin 中的 JDBC 驱动程序？

Question

I am trying to pull in data from a SQL server to a Hive table using Spark in a Zeppelin notebook.我正在尝试使用 Zeppelin 笔记本中的 Spark 将数据从 SQL 服务器提取到 Hive 表。

I am trying to run the following code:我正在尝试运行以下代码：

%pyspark
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.functions import *

spark = SparkSession.builder \
.appName('sample') \
.getOrCreate()

#set url, table, etc.

df = spark.read.format('jdbc') \
.option('url', url) \
.option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver') \
.option('dbtable', table) \
.option('user', user) \
.option('password', password) \
.load()

However, I keep getting the exception:但是，我不断收到异常：

...
Py4JJavaError: An error occurred while calling o81.load.
: java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
...

I have been trying to figure this out all day and I believe something is wrong with how I am trying to set up the driver.我一整天都在试图解决这个问题，我相信我尝试设置驱动程序的方式有问题。 I have a driver under /tmp/sqljdbc42.jar on the instance.我在实例上的/tmp/sqljdbc42.jar下有一个驱动程序。 Can you please explain how I can let Spark know where this driver is?你能解释一下我如何让 Spark 知道这个驱动程序在哪里吗？ I have tried many different ways both through the shell and through the interpreter editor.我通过 shell 和解释器编辑器尝试了许多不同的方法。

Thanks!谢谢！

EDIT编辑

I also should note that I loaded the jar to my instance throug Zeppelin's shell (%sh) using我还应该注意，我使用 Zeppelin 的 shell (%sh) 将 jar 加载到我的实例

curl -o /tmp/sqljdbc42.jar http://central.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar
pyspark --driver-class-path /tmp/sqljdbc42.jar --jars /tmp/sqljdbc42.jar

Answer 1

Here is how I fixed this:这是我解决这个问题的方法：

scp driver jar onto the cluster driver node scp driver jar 到集群驱动节点
Go to Zeppelin interpreter and scroll to the Spark section then click edit.转到 Zeppelin 解释器并滚动到 Spark 部分，然后单击编辑。
Write the complete path to the jar under artifacts eg /home/Hadoop/mssql-jdbc.jar and nothing else.在 artifacts 下写入 jar 的完整路径，例如/home/Hadoop/mssql-jdbc.jar ，仅此而已。
Click save.点击保存。

Then you should be good!那你应该好好的！

Answer 2

You can add it through Web UI in Interpreter settings as follow:您可以通过 Interpreter 设置中的 Web UI 添加它，如下所示：

Click Interpreter in menu单击菜单中的解释器
Click 'edit' button in the Spark interpreter单击 Spark 解释器中的“编辑”按钮
Add the path for the jar in the artifact field在工件字段中添加 jar 的路径
Then just save and restart interpreter.然后只需保存并重新启动解释器。

Answer 3

Similar to Tomas, you can add the driver (or any library) using maven in the interpreter:与 Tomas 类似，您可以在解释器中使用 maven 添加驱动程序（或任何库）：

Click Interpreter in menu单击菜单中的解释器
Click 'edit' button in the Spark interpreter单击 Spark 解释器中的“编辑”按钮
Add the path for the jar in the artifact field在工件字段中添加 jar 的路径
Add the groupId:artifactId:version添加 groupId:artifactId:version

For example, in your case, you can use com.microsoft.sqlserver:mssql-jdbc:jar:8.4.1.jre8 in artifact field.例如，在您的情况下，您可以在工件字段中使用com.microsoft.sqlserver:mssql-jdbc:jar:8.4.1.jre8 。

When you restart the interpreter, it will download and add the dependency for you.当您重新启动解释器时，它会为您下载并添加依赖项。

如何将 Spark 连接到 Zeppelin 中的 JDBC 驱动程序？

问题描述

3 个解决方案

解决方案1
2 已采纳 2019-06-30 13:43:17

解决方案2
0 2018-08-27 22:41:25

解决方案3
0 2020-12-08 03:44:37

如何将 Spark 连接到 Zeppelin 中的 JDBC 驱动程序？

问题描述

3 个解决方案

解决方案1 2 已采纳 2019-06-30 13:43:17

解决方案2 0 2018-08-27 22:41:25

解决方案3 0 2020-12-08 03:44:37

解决方案1
2 已采纳 2019-06-30 13:43:17

解决方案2
0 2018-08-27 22:41:25

解决方案3
0 2020-12-08 03:44:37