简体   繁体   中英

Hadoop with Hive

We want to develop one simple Java EE web application with log file analysis using Hadoop. The following are Approach following to develop the application. But we are unable to through the approach.

  1. Log file would be uploaded into Hadoop server from client machine using sftp/ftp.
  2. Call the Hadoop Job to fetch the log file and process the log file into HDFS file system.
  3. While processing the log file the content will stored into HIVE database.
  4. Search the log content by using HIVE JDBC connection from client web application

We browsed so many sample to full fill some of the steps. But we are not having any concrete sample are not available.

Please suggest the above approach is correct or not and get the links for sample application developed in Java.

I would point out a few thing:
a) You need to merge log files or in some other ways take care that you do not have too much of them. Consider Flume (http://flume.apache.org/) which is built to accept logs from various sources and put them into HDFS.
b) If you go with ftp - you will need some scripting to take data from FTP and put into HDFS.
c) Main problem I see is- to run Hive job as result of the client's web request. Hive request is not interactive - it will take at least dozens of seconds, and probably much more.
I also would be vary of concurrent requests - you proabbly can not run more then a few in parallel

According to me, you can do one thing that:

1)Instead of accepting logs from various sources and put them into HDFS, You can put into one database say SQL Server and from that you can import your data into Hive (or HDFS) using Sqoop .

2) This will reduce your effort for writing the various job to bring the data into HDFS.

3) Once the data come in Hive, you can do whatever you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM