Error while calling a mapred job from a servlet

Question

I am an Hadoop enthusiast who is still in the learning phase, I tried something out of curiosity, I wanted to make a servlet call a hadoop job. I tried two approaches and both failed. Wait, first of all can anybody please tell me if it is feasible? if so, please enlighten with some real-time examples(don't tell me Hue ) or simply you can tell that I am crazy and wasting my time.

Ok, if you are reading this then I ain't crazy. Now please take a look at my code and tell me what am I doing wrong!!!

package com.testingservlets;

import java.io.IOException;
import java.io.PrintWriter;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* Servlet implementation class HelloServlets
*/
  @WebServlet("/HelloServlets")
 public class HelloServlets extends HttpServlet {
     private static final long serialVersionUID = 1L;

     /**
     * @see HttpServlet#HttpServlet()
      */
   public HelloServlets() {
     super();
    // TODO Auto-generated constructor stub
    }

/**
 * @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
 */
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    // TODO Auto-generated method stub
    response.setContentType("text/html");
    PrintWriter out = response.getWriter();
    // TODO Auto-generated method stub


    /*******************************************************************
     * *Approach 1
     * 
     *  Using the Hadoop code directly into servlets
     * *****************************************************************
     */

    String localPath        = "/home/asadgenx/filelist.txt";
     FileSystem fs      =   FileSystem.get( new Configuration());
     Path workingDir    = fs.getWorkingDirectory();

     out.println("DestinationPath path:"+workingDir);

     Path hdfsDir           = new Path(workingDir+"/servelets");

     out.println("DestinationPath Directory:"+workingDir);

     fs.mkdirs(hdfsDir);

     out.println("Source path:"+localPath);

     Path localFile         = new Path(localPath);
     Path newHdfsFile   = new Path(hdfsDir+"/"+"ourTestFile1.txt");

     out.println("Destination File path:"+hdfsDir+"/"+"ourTestFile1.txt");

     fs.copyFromLocalFile(localFile, newHdfsFile);


        /*******************************************************************
         * *Approach 2
         * 
         *  Executing hadoop commands as string using runtime.exec() 
         * *****************************************************************
         */

    String[] cmd = new String[] {"hadoop fs -copyFromLocal /home/asadgenx/filelist.txt /user/asad/myfile.txt"};
    Process process = Runtime.getRuntime().exec(cmd);

     out.println("File copied!!");
}

/**
 * @see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse response)
 */
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    // TODO Auto-generated method stub
 }

}

Error in approach one HTTP Status 500 - Mkdirs failed to create file:/var/lib/tomcat7/servelets

Error In approach two HTTP Status 500 - Cannot run program "hadoop fs -copyFromLocal /home/asadgenx/filelist.txt /user/asad/myfile.txt": error=2, No such file or directory

Can any of the Hadoop experts here help me out with this, please!!!

Answer 1

I hope is not too late to answer your question.

First at all, I will scope the question to accessing a HDFS file system from a tomcat servlet, which is what you are trying to do. I have come over many pitfalls and read so many forum posts to get across it, and it is more a matter of how you setup everything.

To follow approach 2 you should have to deal with SecurityManager, and you wouldn´t like to do that.

To follow approach 1, please review this check list:

Make the appropiate jar files accessible to your webapp. I prefer to place jars per webapp instead of making them available via tomcat. Anyway, your webapp should have access to the following list of jar files (I'm not naming the jar version, maybe some of them are surplus, I am trying to cut down the list from a project that runs a Map Reduce job and then gets the results):
- hadoop-common
- guava
- commons-logging
- commons-cli
- log4j
- commons-lang
- commons-configuration
- hadoop-auth
- slf4j-log4j
- slf4j-api
- hadoop-hdfs
- protobuf-java
- htrace-core

They are located across many directories in your hadoop distribution

Make sure your networking configuration is fine. Test that your hadoop services are up and running, and that you can access all required host and port configurations from your tomcat server to your hadoop server. If they both are located in the same server, even better. Try to access your HDFS monitor ( http://hadoop-host:50070 ) web page from the tomcat server.
Adjust the access privileges to the files you will be reading/writing:

a. From your webapp, you will be able to access only files that are located inside of your webapp directory.

b. From hadoop, your webapp will connect as user "tomcat". Make sure that user tomcat has the right privileges to read or write the intended files in your Hadoop DFS.

As Angus assumed, your Configuration object will be empty. You will need to set the needed configuration parameters by yourself in your servlet.

Once everything is setup, you can run something like this inside your servlet:

//Set the root of the files I will work with in the local file system
String root = getServletContext().getRealPath("/") + "WEB-INF";

//Set the root of the files I will work with in Hadoop DFS
String hroot = "/home/tomcat/output/mrjob";

//Path to the files I will work with
String src = hroot + "/part-00000.avro";
String dest = root + "/classes/avro/result.avro";

//Open the HDFS file system
Configuration hdfsconf = new Configuration();

//Fake Address, replace with yours!
hdfsconf.set("fs.default.name", "hdfs://hadoop-host:54310");
hdfsconf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
hdfsconf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem");

FileSystem hdfs = FileSystem.get(hdfsconf);

//Copy the result to local
hdfs.copyToLocalFile(new Path(src), new Path(dest));

//Delete result
hdfs.delete(new Path(hroot), true);

//Close the file system handler
hdfs.close();

Hope this helps!

Error while calling a mapred job from a servlet

Question

1 answers

solution1
2 2016-08-26 22:43:52

Error while calling a mapred job from a servlet

Question

1 answers

solution1 2 2016-08-26 22:43:52

solution1
2 2016-08-26 22:43:52