简体   繁体   English

从Servlet调用映射作业时出错

[英]Error while calling a mapred job from a servlet

I am an Hadoop enthusiast who is still in the learning phase, I tried something out of curiosity, I wanted to make a servlet call a hadoop job. 我是Hadoop爱好者,仍处于学习阶段,出于好奇,我尝试了一些尝试,我想让servlet称为hadoop工作。 I tried two approaches and both failed. 我尝试了两种方法,但都失败了。 Wait, first of all can anybody please tell me if it is feasible? 等等,首先有人可以告诉我是否可行吗? if so, please enlighten with some real-time examples(don't tell me Hue ) or simply you can tell that I am crazy and wasting my time. 如果是这样,请通过一些实时示例来启发(不要告诉我Hue ),或者干脆可以告诉我我疯了,浪费了我的时间。

Ok, if you are reading this then I ain't crazy. 好吧,如果您正在阅读本文,那么我并不疯。 Now please take a look at my code and tell me what am I doing wrong!!! 现在请看一下我的代码,并告诉我我在做什么错!!!

package com.testingservlets;

import java.io.IOException;
import java.io.PrintWriter;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* Servlet implementation class HelloServlets
*/
  @WebServlet("/HelloServlets")
 public class HelloServlets extends HttpServlet {
     private static final long serialVersionUID = 1L;

     /**
     * @see HttpServlet#HttpServlet()
      */
   public HelloServlets() {
     super();
    // TODO Auto-generated constructor stub
    }

/**
 * @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
 */
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    // TODO Auto-generated method stub
    response.setContentType("text/html");
    PrintWriter out = response.getWriter();
    // TODO Auto-generated method stub


    /*******************************************************************
     * *Approach 1
     * 
     *  Using the Hadoop code directly into servlets
     * *****************************************************************
     */

    String localPath        = "/home/asadgenx/filelist.txt";
     FileSystem fs      =   FileSystem.get( new Configuration());
     Path workingDir    = fs.getWorkingDirectory();

     out.println("DestinationPath path:"+workingDir);

     Path hdfsDir           = new Path(workingDir+"/servelets");

     out.println("DestinationPath Directory:"+workingDir);

     fs.mkdirs(hdfsDir);

     out.println("Source path:"+localPath);

     Path localFile         = new Path(localPath);
     Path newHdfsFile   = new Path(hdfsDir+"/"+"ourTestFile1.txt");

     out.println("Destination File path:"+hdfsDir+"/"+"ourTestFile1.txt");

     fs.copyFromLocalFile(localFile, newHdfsFile);


        /*******************************************************************
         * *Approach 2
         * 
         *  Executing hadoop commands as string using runtime.exec() 
         * *****************************************************************
         */

    String[] cmd = new String[] {"hadoop fs -copyFromLocal /home/asadgenx/filelist.txt /user/asad/myfile.txt"};
    Process process = Runtime.getRuntime().exec(cmd);

     out.println("File copied!!");
}

/**
 * @see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse response)
 */
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    // TODO Auto-generated method stub
 }

} }

Error in approach one HTTP Status 500 - Mkdirs failed to create file:/var/lib/tomcat7/servelets 方法一 HTTP状态500中的错误 -Mkdirs无法创建文件:/ var / lib / tomcat7 / servelets

Error In approach two HTTP Status 500 - Cannot run program "hadoop fs -copyFromLocal /home/asadgenx/filelist.txt /user/asad/myfile.txt": error=2, No such file or directory 错误在方法2中, HTTP状态500- 无法运行程序“ hadoop fs -copyFromLocal /home/asadgenx/filelist.txt /user/asad/myfile.txt”:错误= 2,没有这样的文件或目录

Can any of the Hadoop experts here help me out with this, please!!! 这里的任何Hadoop专家都可以帮我解决这个问题!!!!

I hope is not too late to answer your question. 希望回答您的问题还为时不晚。

First at all, I will scope the question to accessing a HDFS file system from a tomcat servlet, which is what you are trying to do. 首先,我将把问题的范围限定在要从tomcat servlet访问HDFS文件系统的位置。 I have come over many pitfalls and read so many forum posts to get across it, and it is more a matter of how you setup everything. 我遇到了很多陷阱,并且阅读了很多论坛文章以了解它,而这更多地取决于如何设置所有内容。

To follow approach 2 you should have to deal with SecurityManager, and you wouldn´t like to do that. 要遵循方法2,您必须使用SecurityManager,而您不想这样做。

To follow approach 1, please review this check list: 要遵循方法1,请查看此检查清单:

  1. Make the appropiate jar files accessible to your webapp. 使适当的jar文件可用于您的Web应用程序。 I prefer to place jars per webapp instead of making them available via tomcat. 我更喜欢为每个webapp放置jar,而不是通过tomcat使它们可用。 Anyway, your webapp should have access to the following list of jar files (I'm not naming the jar version, maybe some of them are surplus, I am trying to cut down the list from a project that runs a Map Reduce job and then gets the results): 无论如何,您的Web应用程序应该可以访问以下jar文件列表(我没有命名jar版本,也许其中一些是多余的,我试图从运行Map Reduce作业的项目中删除该列表,然后得到结果):

    • hadoop-common Hadoop常见
    • guava 番石榴
    • commons-logging 公共记录
    • commons-cli 公地
    • log4j log4j
    • commons-lang 公地语言
    • commons-configuration 公共配置
    • hadoop-auth hadoop-auth
    • slf4j-log4j slf4j-log4j
    • slf4j-api slf4j-api
    • hadoop-hdfs hadoop-hdfs
    • protobuf-java protobuf-java
    • htrace-core htrace-core

They are located across many directories in your hadoop distribution 它们位于hadoop分发中的许多目录中

  1. Make sure your networking configuration is fine. 确保您的网络配置正常。 Test that your hadoop services are up and running, and that you can access all required host and port configurations from your tomcat server to your hadoop server. 测试您的hadoop服务是否已启动并正在运行,并且可以访问从Tomcat服务器到hadoop服务器的所有必需的主机和端口配置。 If they both are located in the same server, even better. 如果它们都位于同一服务器上,那就更好了。 Try to access your HDFS monitor ( http://hadoop-host:50070 ) web page from the tomcat server. 尝试从tomcat服务器访问您的HDFS监视器( http:// hadoop-host:50070 )网页。

  2. Adjust the access privileges to the files you will be reading/writing: 调整对要读取/写入的文件的访问权限:

a. 一种。 From your webapp, you will be able to access only files that are located inside of your webapp directory. 在您的webapp中,您将只能访问位于webapp目录内的文件。

b. b。 From hadoop, your webapp will connect as user "tomcat". 从hadoop,您的Web应用将以用户“ tomcat”的身份连接。 Make sure that user tomcat has the right privileges to read or write the intended files in your Hadoop DFS. 确保用户tomcat具有读取或写入Hadoop DFS中所需文件的正确特权。

  1. As Angus assumed, your Configuration object will be empty. 如Angus假定的那样,您的Configuration对象将为空。 You will need to set the needed configuration parameters by yourself in your servlet. 您将需要在servlet中自行设置所需的配置参数。

Once everything is setup, you can run something like this inside your servlet: 一切设置完成后,您可以在servlet中运行类似的内容:

//Set the root of the files I will work with in the local file system
String root = getServletContext().getRealPath("/") + "WEB-INF";

//Set the root of the files I will work with in Hadoop DFS
String hroot = "/home/tomcat/output/mrjob";

//Path to the files I will work with
String src = hroot + "/part-00000.avro";
String dest = root + "/classes/avro/result.avro";

//Open the HDFS file system
Configuration hdfsconf = new Configuration();

//Fake Address, replace with yours!
hdfsconf.set("fs.default.name", "hdfs://hadoop-host:54310");
hdfsconf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
hdfsconf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem");

FileSystem hdfs = FileSystem.get(hdfsconf);

//Copy the result to local
hdfs.copyToLocalFile(new Path(src), new Path(dest));

//Delete result
hdfs.delete(new Path(hroot), true);

//Close the file system handler
hdfs.close();

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM