简体   繁体   English

如何将 Spark 连接到 Zeppelin 中的 JDBC 驱动程序?

[英]How do I connect Spark to JDBC driver in Zeppelin?

I am trying to pull in data from a SQL server to a Hive table using Spark in a Zeppelin notebook.我正在尝试使用 Zeppelin 笔记本中的 Spark 将数据从 SQL 服务器提取到 Hive 表。

I am trying to run the following code:我正在尝试运行以下代码:

%pyspark
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.dataframe import DataFrame
from pyspark.sql.functions import *

spark = SparkSession.builder \
.appName('sample') \
.getOrCreate()

#set url, table, etc.

df = spark.read.format('jdbc') \
.option('url', url) \
.option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver') \
.option('dbtable', table) \
.option('user', user) \
.option('password', password) \
.load()

However, I keep getting the exception:但是,我不断收到异常:

...
Py4JJavaError: An error occurred while calling o81.load.
: java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
...

I have been trying to figure this out all day and I believe something is wrong with how I am trying to set up the driver.我一整天都在试图解决这个问题,我相信我尝试设置驱动程序的方式有问题。 I have a driver under /tmp/sqljdbc42.jar on the instance.我在实例上的/tmp/sqljdbc42.jar下有一个驱动程序。 Can you please explain how I can let Spark know where this driver is?你能解释一下我如何让 Spark 知道这个驱动程序在哪里吗? I have tried many different ways both through the shell and through the interpreter editor.我通过 shell 和解释器编辑器尝试了许多不同的方法。

Thanks!谢谢!

EDIT编辑

I also should note that I loaded the jar to my instance throug Zeppelin's shell (%sh) using我还应该注意,我使用 Zeppelin 的 shell (%sh) 将 jar 加载到我的实例

curl -o /tmp/sqljdbc42.jar http://central.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar
pyspark --driver-class-path /tmp/sqljdbc42.jar --jars /tmp/sqljdbc42.jar

Here is how I fixed this:这是我解决这个问题的方法:

  1. scp driver jar onto the cluster driver node scp driver jar 到集群驱动节点

  2. Go to Zeppelin interpreter and scroll to the Spark section then click edit.转到 Zeppelin 解释器并滚动到 Spark 部分,然后单击编辑。

  3. Write the complete path to the jar under artifacts eg /home/Hadoop/mssql-jdbc.jar and nothing else.在 artifacts 下写入 jar 的完整路径,例如/home/Hadoop/mssql-jdbc.jar ,仅此而已。

  4. Click save.点击保存。

Then you should be good!那你应该好好的!

You can add it through Web UI in Interpreter settings as follow:您可以通过 Interpreter 设置中的 Web UI 添加它,如下所示:

  • Click Interpreter in menu单击菜单中的解释器

  • Click 'edit' button in the Spark interpreter单击 Spark 解释器中的“编辑”按钮

  • Add the path for the jar in the artifact field在工件字段中添加 jar 的路径

  • Then just save and restart interpreter.然后只需保存并重新启动解释器。

Similar to Tomas, you can add the driver (or any library) using maven in the interpreter:与 Tomas 类似,您可以在解释器中使用 maven 添加驱动程序(或任何库):

  • Click Interpreter in menu单击菜单中的解释器
  • Click 'edit' button in the Spark interpreter单击 Spark 解释器中的“编辑”按钮
  • Add the path for the jar in the artifact field在工件字段中添加 jar 的路径
  • Add the groupId:artifactId:version添加 groupId:artifactId:version

For example, in your case, you can use com.microsoft.sqlserver:mssql-jdbc:jar:8.4.1.jre8 in artifact field.例如,在您的情况下,您可以在工件字段中使用com.microsoft.sqlserver:mssql-jdbc:jar:8.4.1.jre8

When you restart the interpreter, it will download and add the dependency for you.当您重新启动解释器时,它会为您下载并添加依赖项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Dataproc:如何配置 Spark 驱动程序和执行程序 log4j 属性? - Dataproc: how do I configure Spark driver and executor log4j properties? 如何将 EBS 连接到 ECS Fargate? - How do I connect EBS to ECS Fargate? 如何解决“无法初始化日志记录驱动程序”(ECS) - How do I troubleshoot "failed to initialize logging driver" (ECS) 我如何从 databricks spark 连接到 docu.net db 启用 TLS 的集群? - How can i connect to documnet db TLS enabled cluster from databricks spark? 如何将 function 从另一个目录连接到 Go/Echo? - How do I connect a function from another directory to Go/Echo? 如何在 terraform 中将 rds 与弹性 beantalk 连接 - How do I connect rds with elastic beanstalk in terraform 如何在 EC2 Connect AWS 中复制/粘贴 - How do I copy/paste in EC2 Connect AWS 如何将 documentdb 连接到 emr 实例中的 spark 应用程序 - How to connect documentdb to a spark application in an emr instance 我们如何在 Azure 上为 Spark on Kube.netes 提供不同的驱动程序和执行程序? - How can we have different driver and executor for Spark on Kubernetes on Azure? 如何将我的 MAUI 应用程序连接到生产环境中的 Firestore 数据库? - How do I connect my MAUI app to a Firestore database in production?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM