We want to develop one simple Java EE web application with log file analysis using Hadoop. The following are Approach following to develop the application. But we are unable to through the approach.
We browsed so many sample to full fill some of the steps. But we are not having any concrete sample are not available.
Please suggest the above approach is correct or not and get the links for sample application developed in Java.
I would point out a few thing:
a) You need to merge log files or in some other ways take care that you do not have too much of them. Consider Flume (http://flume.apache.org/) which is built to accept logs from various sources and put them into HDFS.
b) If you go with ftp - you will need some scripting to take data from FTP and put into HDFS.
c) Main problem I see is- to run Hive job as result of the client's web request. Hive request is not interactive - it will take at least dozens of seconds, and probably much more.
I also would be vary of concurrent requests - you proabbly can not run more then a few in parallel
According to me, you can do one thing that:
1)Instead of accepting logs from various sources and put them into HDFS, You can put into one database say SQL Server and from that you can import your data into Hive (or HDFS) using Sqoop .
2) This will reduce your effort for writing the various job to bring the data into HDFS.
3) Once the data come in Hive, you can do whatever you want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.