[英]Spark Streaming: How to efficiently save foreachRDD data into Mysql Database?
We are going to build a real-time computation system, Also want to save processed data into Mysql Database, here's the code below: 我们将建立一个实时计算系统,也想将处理后的数据保存到Mysql数据库中,下面是下面的代码:
splitWordInfo.foreachRDD(new Function<JavaRDD<String>, Void>() {
private static final long serialVersionUID = 1L;
@Override
public Void call(JavaRDD<String> rdd) throws Exception {
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
// Default Serial ID
private static final long serialVersionUID = 1L;
@Override
public void call(Iterator<String> eachline) throws Exception {
String sql = "insert into test_mm(name,addr) values(?)";
Connection conn = DriverManager.getConnection("jdbc:mysql://xx.xx.xx.xx:3306/dbname", "user", "pass");
PreparedStatement stat = conn.prepareStatement(sql);
while(eachline.hasNext()){
stat.setString(1, eachline.next());
stat.executeUpdate();
}
stat.close();
conn.close();
}
});
return null;
}
});
Does it will open/close mysql connection for each rdd, or for each partition? 是否会为每个rdd或每个分区打开/关闭mysql连接?
And How to efficiently save foreachRDD data into Mysql database. 以及如何有效地将foreachRDD数据保存到Mysql数据库中。 Could anyone do me a favor?
有人能帮我一个忙吗?
Each RDD partition is like a separate task and your program will get a connection for each partition. 每个RDD分区就像一个单独的任务,您的程序将为每个分区获得连接。 It is good to use a connection pool library like Hikari or Tomcat .
最好使用Hikari或Tomcat之类的连接池库。 But even with connection pool there will be a cost of communication with database.
但是,即使有了连接池,与数据库的通信也要付出一定的代价。 That you can not avoid in this model.
在这种模式下您无法避免。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.