简体   繁体   中英

Rest api vs sqoop

I was trying to import data from mysql to hdfs . I was able to do it with sqoop but this can be done by fetching the data from api also.

My question is about when to use rest api to load data in hdfs instead of sqoop ?

Please specify some difference with use cases!

Sqoop (SQL <=> Hadoop) is basically used for loading data from RDBMS to HDFS .

It's a direct connection to database where you can append/modify/delete data in table(s) using sqoop eval command if privileges are not defined properly for the user accessing the db from sqoop

But using Rest web services api we can fetch data from various databases ( can be NoSQL or RDBMS both ) connected internally via code.

Consider you are calling a getUsersData restful web service using curl command which is specifically designed only to provide users data and doesn't allow to append/modify/update any components of db irrespective of database (RDBMS/NoSQL)

You could use Sqoop to pull data from Mysql and into Hbase, then put a REST API over Hbase (on Hadoop)... Would be not much different than a REST API over Mysql.

Basically, you're comparing two different things. Hadoop is not meant to replace traditional databases or N-tier user-facing applications, it just is a more distributed, fault tolerant place to store large amounts of data.

And you typically wouldn't use a REST API to talk to a database, then put those values into Hadoop, because that wouldn't be distributed and all database results go through a single process

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM