简体繁体 English

hadoop-hadoop中的输入/输出文件存储在哪里，以及如何在hadoop中执行Java文件？

[英]hadoop - Where are input/output files stored in hadoop and how to execute java file in hadoop?

原文 2011-03-21 10:19:11 8 3 hadoop

Suppose I write a java program and i want to run it in Hadoop, then 假设我编写了一个Java程序，并且想在Hadoop中运行它，然后

where should the file be saved? 文件应保存在哪里？
how to access it from hadoop? 如何从hadoop访问它？
should i be calling it by the following command? 我应该通过以下命令调用它吗？ hadoop classname
what is the command in hadoop to execute the java file? hadoop中执行Java文件的命令是什么？

3 个解决方案

The simplest answers I can think of to your questions are: 我能想到的最简单的答案是：

1) Anywhere 1）任何地方
2,3,4) $HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file] 2,3,4） $HADOOP_HOME/bin/hadoop jar [path_to_your_jar_file]

A similar question was asked here Executing helloworld.java in apache hadoop 在这里问了类似的问题在apache hadoop中执行helloworld.java

It may seem complicated, but it's simpler than you might think! 它可能看起来很复杂，但是比您想象的要简单！

Compile your map/reduce classes, and your main class into a jar. 将您的map/reduce类和main类编译到jar中。 Let's call this jar myjob.jar . 我们将此罐myjob.jar 。
- This jar does not need to include the Hadoop libraries, but it should include any other dependencies you have. 该jar不需要包含Hadoop库，但应包含您具有的任何其他依赖关系。
- Your main method should set up and run your map/reduce job, here is an example . 您的主要方法应设置并运行地图/归约工作，这是一个示例。
Put this jar on any machine with the hadoop command line utility installed. 将此jar放在安装了hadoop命令行实用程序的任何计算机上。
Run your main method using the hadoop command line utility: 使用hadoop命令行实用程序运行您的main方法：
- hadoop jar myjob.jar

Hope that helps. 希望能有所帮助。

where should the file be saved? 文件应保存在哪里？

The data should be saved in "hdfs". 数据应保存在“ hdfs”中。 You will want to probably load it into the cluster from your data source using something like Apache Flume. 您可能希望使用Apache Flume之类的工具将其从数据源加载到集群中。 The file can be placed anywhere but most home is /user/hadoop/ 该文件可以放在任何位置，但大多数位置是/ user / hadoop /