Limit number of connection to MySQL database using JDBC driver in spark

Question

I am currently importing data from a MySQL database into spark using the JDBC driver using the following command in pyspark:

dataframe_mysql = sqlctx
    .read
    .format("jdbc")
    .option("url", "jdbc:mysql://<IP-ADDRESS>:3306/<DATABASE>")
    .option("driver", "com.mysql.jdbc.Driver")
    .option("dbtable", "<TABLE>")
    .option("user", "<USER>")
    .option("password","<PASSWORD>")
    .load()

When I run the spark job, I get the following error message:

com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException (Too many connections).

It seems that since several nodes are attempting to connect concurrently to the database, I am exceeding MySQL's connection limit (151) and this is causing my job to run slower.

How can I limit the number of connections that the JDBC driver uses in pyspark? Any help would be great!

Answer 1

Try to usenumPartitions param. According to the documentation it is the maximum number of partitions that can be used for parallelism in table reading and writing. This also determines the maximum number of concurrent JDBC connections. If the number of partitions to write exceeds this limit, then there is a decrease to this limit by calling coalesce(numPartitions) before writing.

Answer 2

我想您应该减少默认分区大小，或减少执行程序的数量。

Limit number of connection to MySQL database using JDBC driver in spark

Question

2 answers

solution1
1 2020-11-04 06:20:26

solution2
0 2017-08-30 00:21:57

Limit number of connection to MySQL database using JDBC driver in spark

Question

2 answers

solution1 1 2020-11-04 06:20:26

solution2 0 2017-08-30 00:21:57

solution1
1 2020-11-04 06:20:26

solution2
0 2017-08-30 00:21:57