Please help me understand what would be the best way to save output of spark javaRDD
into database?
Should I write spark java code to save RDD
into database? What would be drawback of this approach ?
Or I should use sqoop
to save output files into database?
Is there any other way to to this?
Thanks
used dataframe and saved data into sql server
SQLContext sqlcontext=new SQLContext(context);
DataFrame outDataFrame=sqlcontext.createDataFrame(finalOutPutRDD, WebHttpOutPutVO.class);
Properties prop = new java.util.Properties();
prop.setProperty("database", "Web_Session");
prop.setProperty("user", "user");
prop.setProperty("password", "pwd@123");
prop.setProperty("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver");
outDataFrame.write().mode(org.apache.spark.sql.SaveMode.Append).jdbc("jdbc:sqlserver://<Host>:1433", "test_table", prop);
There are two approaches you can use for writing your results back to the database.
Use something like DBOutputFormat and configure that
Use foreachPartition on the RDD you want to save and pass in a function which creates a connection to MySQL and writes the result back.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.