简体   繁体   English

Pyspark 读取 jdbc 给出错误。 怎么修?

[英]Pyspark read jdbc giving errors . How to fix?

I am connecting to RDS MySQL using JDBC in pyspark. I have tried almost everything that I found on Stackoverflow for debugging but still, i am unable to make it work.我正在使用 pyspark 中的 JDBC 连接到 RDS MySQL。我已经尝试了几乎所有我在 Stackoverflow 上找到的用于调试的东西,但我仍然无法让它工作。

spark = SparkSession.builder.config("spark.jars", mysql_jar) \
            .master("local[*]").appName("PySpark_MySQL_test").getOrCreate()
df= spark.read.format("jdbc").option("url", "jdbc:mysql://hostname.amazonaws.com:1150/dbname?user=user_name&password=password") \
            .option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "table_name").load()

I have tried using the same connection details in pymysql library of python it connects and brings back the result.我尝试在 python 的 pymysql 库中使用相同的连接详细信息,它连接并返回结果。
But here I getting the below error and am unable to solve it.但是在这里我收到以下错误并且无法解决。



raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o38.load.
: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
    at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:827)
    at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:447)
    at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:237)
    at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:199)
    at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
    at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:68)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:62)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure

I have experienced the same issues.Now it is worked.The core reason is spark use master node to connect mysql and use work nodes to execute task.So you can connect mysql while raise communication error.Based on this theory,you can open the security rules on mysql to let all spark node can connect to mysql我也遇到过同样的问题。现在可以了。核心原因是spark使用主节点连接mysql并使用工作节点执行任务。所以你可以连接mysql同时引发通信错误。基于这个理论,你可以打开mysql 上的安全规则,让所有 Spark 节点都可以连接到 mysql

For anyone coming here for an answer using Docker give the below solution a try.对于使用 Docker 来这里寻求答案的任何人,请尝试以下解决方案。 use the below configuration使用以下配置

source_df = spark.read.format('jdbc').options(
        url='jdbc:mysql://host.docker.internal:3306/superset?useSSL=false&allowPublicKeyRetrieval=true',
        driver='com.mysql.cj.jdbc.Driver',
        dbtable='table',
        user='root',
        password='root').load()

I have tried the host with localhost , 127.0.0.1 , and even the IPAddress from docker inspect but didn't work then changed it to host.docker.internal and it worked.我已经尝试使用localhost127.0.0.1甚至来自 docker 检查的IPAddress的主机,但没有工作,然后将其更改为host.docker.internal并且它工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 JDBC copymanager 将 zip 条目读取到 postgres 表中 - How to read a zip entry into a postgres table using the JDBC copymanager 在 Pyspark 中读取 Json - Read Json in Pyspark 使用 simba JDBC 从 pyspark 连接到 BigQuery - Connect to BigQuery from pyspark using simba JDBC AWS cognito 在 node.js 中注册时出现序列化异常(结构开始或地图未在预期位置找到)如何解决此问题? - AWS cognito is giving serializationException(Start of structure or map found where not expected) while doing sign up in node.js How to fix this issue? 我如何修复这个 Pyspark 修改数组内的复杂结构 - How do I fix this Pyspark which modifies a complex struct inside an array 如何从 S3 读取 a.txt 文件并将生成的 dataframe 用作 pyspark 中的 SQL 查询 - How to read a .txt file from S3 and use the resulting dataframe as a SQL query in pyspark 如何修复错误“DynamoDB 返回未指定的 GSI 范围键错误”? - How can I fix the error "DynamoDB returns GSI range key not specified errors"? 无法在 AWS 上的 EC2 实例上从 S3 读取 csv 到 pyspark dataframe - Can't read csv from S3 to pyspark dataframe on a EC2 instance on AWS 如何将 Spark 连接到 Zeppelin 中的 JDBC 驱动程序? - How do I connect Spark to JDBC driver in Zeppelin? 如何修复 AccessDenied 调用 CopyObject - How to fix AccessDenied calling CopyObject
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM