简体繁体中英

ETL process to transfer data from one Db to another using Apache Spark

原文 2017-08-16 05:58:26 1 1 java/ amazon-web-services/ hadoop/ apache-spark/ amazon-ec2

I need to create an ETL process that will extract, tranform & then load 100+ tables from several instances of SQLServer to as many instances of Oracle in parallel on a daily basis. I understand that I can create multiple threads in Java to accomplish this but if all of them run on the same machine this approach won't scale. Another approach could be to get a bunch of ec2 instances & start transferring tables for each instance on a different ec2 instance. With this approach, though, I would have to take care of "elasticity" by adding/removing machines from my pool.

Somehow I think I can use "Apache Spark on Amazon EMR" to accomplish this, but in the past I've used Spark only to handle data on HDFS/Hive, so not sure if transferring data from one Db to another Db is a good use case for Spark - or - is it?

1 answers

Starting from your last question: "Not sure if transferring data from one Db to another Db is a good use case for Spark" :

It is, within the limitation of the JDBC spark connector. There are some limitations such as the missing support in updates, and the parallelism when reading the table (requires splitting the table by a numeric column).

Considering the IO cost and the overall performance of RDBMS, running the jobs in a FIFO mode does not sound like a good idea. You can submit each one of the jobs with a configuration that requires 1/x of cluster resources so x tables will be processed in parallel.

Transfer tables from one DB (HANA) to another DB (MySQL

Large Data transfer from one Service to another

Transfer file from one machine to another using AS2 transfer protocol

How to process Json data from Apache spark Streaming in java

How to transfer data from one page to another using angular js and Java servlets

Transfer data from one Jframe to another jframe using static class or this reference(Eclipse)?

Can we share or transfer data from one Activity to another Activity using SharedPreference?

how to transfer data from one html page to another without using forms and input tags

Transfer data from one Jframe to another jframe using static class or this reference?

Can Apache Spark speed up the process of reading millions of records from Oracle DB and then writing these to a file?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Transfer tables from one DB (HANA) to another DB (MySQL Large Data transfer from one Service to another Transfer file from one machine to another using AS2 transfer protocol How to process Json data from Apache spark Streaming in java How to transfer data from one page to another using angular js and Java servlets Transfer data from one Jframe to another jframe using static class or this reference(Eclipse)? Can we share or transfer data from one Activity to another Activity using SharedPreference? how to transfer data from one html page to another without using forms and input tags Transfer data from one Jframe to another jframe using static class or this reference? Can Apache Spark speed up the process of reading millions of records from Oracle DB and then writing these to a file?

Related Tags

ETL process to transfer data from one Db to another using Apache Spark

Question

1 answers

solution1 0 2017-08-16 07:35:13

solution1
0 2017-08-16 07:35:13