简体   繁体   English

将pyspark数据帧写入MySQL数据库时出错

[英]Error While writing pyspark dataframe to MySQL database

I am getting the following error: 我收到以下错误:

"Caused by: java.lang.NoSuchMethodException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.<init>()" while writing pyspark dataframe to mysql database 将pyspark数据框写入mysql数据库时, "Caused by: java.lang.NoSuchMethodException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.<init>()"

spark-submit command: spark-submit命令:

spark-submit --deploy-mode client --master yarn --conf spark.pyspark.python=/usr/bin/python3 --packages mysql:mysql-connector-java:8.0.12 s3://aramark-files/test_pyspark.py

And I am writing using: 我在写:

df.write.jdbc(url="jdbc:mysql://dbhost/dbname", table="tablename", mode="append", properties={"user":"dbuser", "password": "s3cret"})

Below is the error I am getting after executing the above spark-submit command: 以下是执行上述spark-submit命令后出现的错误:

Traceback (most recent call last):
  File "/mnt/tmp/spark-8bb457ce-fc88-4384-af58-9e52e2d6e21a/test_pyspark.py", line 51, in <module>
    df.write.jdbc(jdbcUrl, where, mode='append', properties=dbProperties)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 942, in jdbc
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o79.jdbc.
: java.lang.InstantiationException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper
    at java.lang.Class.newInstance(Class.java:427)
    at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:53)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:55)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
    at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:499)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.<init>()
    at java.lang.Class.getConstructor0(Class.java:3082)
    at java.lang.Class.newInstance(Class.java:412)
    ... 34 more

I ran across the same problem in the Scala API. 我在Scala API中遇到了相同的问题。 I'm reading from and writing to an Oracle 12c database, and both the DataFrameReader and the DataFrameWriter require the "driver" property to be set, in my case to "oracle.jdbc.OracleDriver", or else the former blows up with "No suitable driver" and the latter blows up with NoSuchMethodException. 我正在读取和写入Oracle 12c数据库,并且DataFrameReader和DataFrameWriter都需要设置“ driver”属性,在我的情况下为“ oracle.jdbc.OracleDriver”,否则前者会因“没有合适的驱动程序”,而后者会因NoSuchMethodException而崩溃。

I would therefore suggest you try 因此,我建议您尝试

df.write.jdbc(url="jdbc:mysql://dbhost/dbname", table="tablename", mode="append", properties={"user":"dbuser", "password": "s3cret", "driver": "com.mysql.cj.jdbc.Driver" })

Where I've substituted the MySQL driver class name from the docs . 我从docs替换MySQL驱动程序类名称的位置

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM