简体   繁体   中英

spark javaRDD output to database

Please help me understand what would be the best way to save output of spark javaRDD into database?

Should I write spark java code to save RDD into database? What would be drawback of this approach ?

Or I should use sqoop to save output files into database?

Is there any other way to to this?

Thanks

used dataframe and saved data into sql server

SQLContext sqlcontext=new SQLContext(context);
DataFrame outDataFrame=sqlcontext.createDataFrame(finalOutPutRDD, WebHttpOutPutVO.class);
Properties prop = new java.util.Properties();
prop.setProperty("database", "Web_Session");
prop.setProperty("user", "user");
prop.setProperty("password", "pwd@123");
prop.setProperty("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver");
outDataFrame.write().mode(org.apache.spark.sql.SaveMode.Append).jdbc("jdbc:sqlserver://<Host>:1433", "test_table", prop);

There are two approaches you can use for writing your results back to the database.

  1. Use something like DBOutputFormat and configure that

  2. Use foreachPartition on the RDD you want to save and pass in a function which creates a connection to MySQL and writes the result back.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM