简体   繁体   English

Spark Streaming:如何有效地将foreachRDD数据保存到Mysql数据库中?

[英]Spark Streaming: How to efficiently save foreachRDD data into Mysql Database?

We are going to build a real-time computation system, Also want to save processed data into Mysql Database, here's the code below: 我们将建立一个实时计算系统,也想将处理后的数据保存到Mysql数据库中,下面是下面的代码:

splitWordInfo.foreachRDD(new Function<JavaRDD<String>, Void>() {
        private static final long serialVersionUID = 1L;

        @Override
        public Void call(JavaRDD<String> rdd) throws Exception {
            rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
                // Default Serial ID
                private static final long serialVersionUID = 1L;
                @Override
                public void call(Iterator<String> eachline) throws Exception {
                    String sql = "insert into test_mm(name,addr) values(?)";
                    Connection conn = DriverManager.getConnection("jdbc:mysql://xx.xx.xx.xx:3306/dbname", "user", "pass");
                    PreparedStatement stat = conn.prepareStatement(sql); 
                    while(eachline.hasNext()){
                        stat.setString(1, eachline.next());
                        stat.executeUpdate();
                    }
                    stat.close();
                    conn.close();
                }

            });
            return null;
        }
    });

Does it will open/close mysql connection for each rdd, or for each partition? 是否会为每个rdd或每个分区打开/关闭mysql连接?

And How to efficiently save foreachRDD data into Mysql database. 以及如何有效地将foreachRDD数据保存到Mysql数据库中。 Could anyone do me a favor? 有人能帮我一个忙吗?

Each RDD partition is like a separate task and your program will get a connection for each partition. 每个RDD分区就像一个单独的任务,您的程序将为每个分区获得连接。 It is good to use a connection pool library like Hikari or Tomcat . 最好使用Hikari或Tomcat之类的连接池库。 But even with connection pool there will be a cost of communication with database. 但是,即使有了连接池,与数据库的通信也要付出一定的代价。 That you can not avoid in this model. 在这种模式下您无法避免。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Spark Streaming应用程序中,如何在一行代码块.foreachRDD()完成执行之后执行lines.map()函数 - In a spark streaming application, how to execute lines.map() function after a block of lines.foreachRDD() completes execution Spark Streaming,foreachRDD错误:比较方法违反了其一般约定 - Spark Streaming, foreachRDD error : Comparison method violates its general contract 如何使用Java将数据从Spark Streaming保存到Cassandra? - How to save data from spark streaming to cassandra using java? 在Apache Spark Streaming中在foreachRDD中使用数据库连接 - Using a db connection inside foreachRDD in apache spark streaming Spark ml Streaming ForecastOnValues如何保存结果? - Spark ml streaming predictOnValues how to save results? 如何使用Java使用foreachRDD发送数据 - how to send data using foreachRDD using Java 如何在 mysql 数据库中保存选定的循环数据? - How to save selected for loop data in mysql database? Spark Streaming:使用PairRDD.saveAsNewHadoopDataset函数将数据保存到HBase - Spark Streaming: Using PairRDD.saveAsNewHadoopDataset function to save data to HBase 是否可以使用火花流 stream 数据库表数据 - is it possible to stream a database table data using spark streaming 如何在 Android 内部保存流式传感器数据? - How to save the streaming sensor data internally on Android?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM