简体   繁体   中英

Servlet executes a Hadoop MapReduce job and display result

I have a tomcat server, have several servlets, a mapreduce job (written using hadoop), also have pig installed, all sit in the same cluster as where hadoop is.

Now I need my servlet to be able to execute a mapreduce program (or a pig script), and display the results returned by the mapreduce program. Is there anyway to make a servlet to execute a mapreduce job and get back the results?

++ I think it is possible to make my servlet execute a mapreduce job (or a pig script) by simply calling exec or ProcessBuilder. If I am wrong, please correct me here.

++ However, a mapreduce job (or a pig script) produces results in HDFS, which is where I am unsure about how to get back the results and feed them back to the servlet. One solution, which seems to be amateur and inefficient to me, is to use ProcessBuilder (or exec) again to copy results from HDFS to local, and read results from there.

Would very much appreciate any suggestion you might share.

You can use REST interface for hdfs to get the files from hdfs.

The REST url would look like

http://something.net:50070/webhdfs/v1/path/to/output

BTW, to submit the jobs you could also use oozie's client api to submit instead of "exec". Oozie's client api is much better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM