使用Python脚本中的'jdbc'为Spark DataFrame'write'加载JDBC驱动程序

Question

I'm trying to load MySQL JDBC driver from a python app. 我正在尝试从python应用程序加载MySQL JDBC驱动程序。 I'm not invoking 'bin/pyspark' or 'spark-submit' program; 我没有调用“ bin / pyspark”或“ spark-submit”程序； instead I have a Python script in which I'm initializing 'SparkContext' and 'SparkSession' objects. 相反，我有一个Python脚本，在其中初始化“ SparkContext”和“ SparkSession”对象。 I understand that we can pass '--jars' option when invoking 'pyspark', but how do I load and specify jdbc driver in my python app? 我了解我们可以在调用“ pyspark”时传递“ --jars”选项，但是如何在我的python应用程序中加载和指定jdbc驱动程序？

Answer 1

I think you want do something like this 我想你想做这样的事情

from pyspark.sql import SparkSession

# Creates spark session with JDBC JAR
spark = SparkSession.builder \
    .appName('stack_overflow') \
    .config('spark.jars', '/path/to/mysql/jdbc/connector') \
    .getOrCreate()

# Creates your DataFrame with spark session with JDBC
df = spark.createDataFrame([
    (1, 'Hello'),
    (2, 'World!')
], ['Index', 'Value'])

df.write.jdbc('jdbc:mysql://host:3306/my_db', 'my_table',
              mode='overwrite',
              properties={'user': 'db_user', 'password': 'db_pass'})

Answer 2

Answer is to create SparkContext like this: 答案是像这样创建SparkContext：

spark_conf = SparkConf().set("spark.jars",  "/my/path/mysql_jdbc_driver.jar")
sc = SparkContext(conf=spark_conf)

This will load mysql driver into classpath. 这会将mysql驱动程序加载到类路径中。

使用Python脚本中的'jdbc'为Spark DataFrame'write'加载JDBC驱动程序

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-06-03 22:08:15

解决方案2
0 2019-06-03 22:09:38

使用Python脚本中的&#39;jdbc&#39;为Spark DataFrame&#39;write&#39;加载JDBC驱动程序

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-06-03 22:08:15

解决方案2 0 2019-06-03 22:09:38

使用Python脚本中的'jdbc'为Spark DataFrame'write'加载JDBC驱动程序

解决方案1
1 已采纳 2019-06-03 22:08:15

解决方案2
0 2019-06-03 22:09:38