简体   繁体   English

没有输入文件的Hadoop流作业

[英]Hadoop Streaming Job with no input file

Is it possible to execute a Hadoop Streaming job that has no input file? 是否可以执行没有输入文件的Hadoop Streaming作业?

In my use case, I'm able to generate the necessary records for the reducer with a single mapper and execution parameters. 在我的用例中,我能够使用单个映射器和执行参数为化简器生成必要的记录。 Currently, I'm using a stub input file with a single line, I'd like to remove this requirement. 目前,我正在使用单行存根输入文件,我想删除此要求。

We have 2 use cases in mind. 我们有2个用例。
1) 1)

  1. I want to distribute the loading of files into hdfs from a network location available to all nodes. 我想从对所有节点可用的网络位置将文件的负载分布到hdfs中。 Basically, I'm going to run ls in the mapper and send the output to a small set of reducers. 基本上,我将在映射器中运行ls并将输出发送到一小组reducer。
  2. We are going to be running fits leveraging several different parameter ranges against several models. 我们将针对几个模型利用几个不同的参数范围进行拟合。 The model names do not change and will go to the reducer as keys while the list of tests to run is generated in the mapper. 模型名称不会更改,而将在映射器中生成要运行的测试列表时,将其作为键转到化简器。

According to the docs this is not possible. 根据文档,这是不可能的。 The following are required parameters for execution: 以下是执行所需的参数:

  • input directoryname or filename 输入目录名或文件名
  • output directoryname 输出目录名
  • mapper executable or JavaClassName 映射器可执行文件或JavaClassName
  • reducer executable or JavaClassName reducer可执行文件或JavaClassName

It looks like providing a dummy input file is the way to go currently. 看起来提供虚拟输入文件是当前的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM