Servlet执行Hadoop MapReduce作业并显示结果

Question

I have a tomcat server, have several servlets, a mapreduce job (written using hadoop), also have pig installed, all sit in the same cluster as where hadoop is. 我有一个tomcat服务器，有几个servlet，一个mapreduce作业（使用hadoop编写），还安装了Pig，它们都与hadoop处于同一群集中。

Now I need my servlet to be able to execute a mapreduce program (or a pig script), and display the results returned by the mapreduce program. 现在，我需要我的servlet能够执行mapreduce程序（或Pig脚本），并显示mapreduce程序返回的结果。 Is there anyway to make a servlet to execute a mapreduce job and get back the results? 无论如何，有没有使servlet执行mapreduce作业并取回结果的方法？

++ I think it is possible to make my servlet execute a mapreduce job (or a pig script) by simply calling exec or ProcessBuilder. ++我认为可以通过简单地调用exec或ProcessBuilder来使我的servlet执行mapreduce作业（或Pig脚本）。 If I am wrong, please correct me here. 如果我错了，请在这里纠正我。

++ However, a mapreduce job (or a pig script) produces results in HDFS, which is where I am unsure about how to get back the results and feed them back to the servlet. ++但是，mapreduce作业（或Pig脚本）会在HDFS中生成结果，这是我不确定如何取回结果并将其反馈回servlet的地方。 One solution, which seems to be amateur and inefficient to me, is to use ProcessBuilder (or exec) again to copy results from HDFS to local, and read results from there. 一种解决方案对我来说似乎是业余的且效率低下，它是再次使用ProcessBuilder（或exec）将结果从HDFS复制到本地，并从那里读取结果。

Would very much appreciate any suggestion you might share. 非常感谢您分享的任何建议。

Answer 1

You can use REST interface for hdfs to get the files from hdfs. 您可以对HDFS使用REST接口，以从HDFS获取文件。

The REST url would look like REST URL看起来像

http://something.net:50070/webhdfs/v1/path/to/output

BTW, to submit the jobs you could also use oozie's client api to submit instead of "exec". 顺便说一句，要提交作业，您还可以使用oozie的客户端api而不是“ exec”来提交。 Oozie's client api is much better. Oozie的客户端API更好。

Servlet执行Hadoop MapReduce作业并显示结果

问题描述

1 个解决方案

解决方案1
1 2013-07-02 15:19:33

Servlet执行Hadoop MapReduce作业并显示结果

问题描述

1 个解决方案

解决方案1 1 2013-07-02 15:19:33

解决方案1
1 2013-07-02 15:19:33