简体   繁体   中英

Does the hadoop stack support analyzing big dynamic data?

I've gone through a few days of tutorials on how to load data into hive. People talk about

CREATE EXTERNAL TABLE

to load data from an external source. This source is always a static file: .txt, .csv etc...

I want to know if hive supports external MS-SQL tables as well (dynamic data)? Or, do I have to do something like...extract all data from SQL table into a *.csv then use this static file for analysis in HIVE?

This export is troublesome for tables with millions of rows if it must be repeated reguarly. If it is supported, how do I accomplish this task?

Update
Sqoop has Incremental Imports which can keep HADOOP updated with current MS-SQL data:

Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.

Now I need to figure out how this can be run in an automated way.

I'm pretty sure the tool you'll want to use is Sqoop .

To quote from the Sqoop home page:

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM