简体繁体 English

没有输入文件的Hadoop流作业

[英]Hadoop Streaming Job with no input file

原文 2014-04-02 19:20:00 3 1 hadoop/ hadoop-streaming

Is it possible to execute a Hadoop Streaming job that has no input file? 是否可以执行没有输入文件的Hadoop Streaming作业？

In my use case, I'm able to generate the necessary records for the reducer with a single mapper and execution parameters. 在我的用例中，我能够使用单个映射器和执行参数为化简器生成必要的记录。 Currently, I'm using a stub input file with a single line, I'd like to remove this requirement. 目前，我正在使用单行存根输入文件，我想删除此要求。

We have 2 use cases in mind. 我们有2个用例。
1) 1）

I want to distribute the loading of files into hdfs from a network location available to all nodes. 我想从对所有节点可用的网络位置将文件的负载分布到hdfs中。 Basically, I'm going to run ls in the mapper and send the output to a small set of reducers. 基本上，我将在映射器中运行ls并将输出发送到一小组reducer。
We are going to be running fits leveraging several different parameter ranges against several models. 我们将针对几个模型利用几个不同的参数范围进行拟合。 The model names do not change and will go to the reducer as keys while the list of tests to run is generated in the mapper. 模型名称不会更改，而将在映射器中生成要运行的测试列表时，将其作为键转到化简器。

1 个解决方案

According to the docs this is not possible. 根据文档，这是不可能的。 The following are required parameters for execution: 以下是执行所需的参数：

input directoryname or filename 输入目录名或文件名
output directoryname 输出目录名
mapper executable or JavaClassName 映射器可执行文件或JavaClassName
reducer executable or JavaClassName reducer可执行文件或JavaClassName

It looks like providing a dummy input file is the way to go currently. 看起来提供虚拟输入文件是当前的方法。

具有二进制输入的Hadoop流作业？ - Hadoop Streaming Job with binary input?

Hadoop作业输入文件的位置 - Location of a Hadoop job input file

Hadoop错误：启动作业时出错，输入路径错误：文件不存在。流命令失败 - Hadoop Error: Error launching job , bad input path : File does not exist.Streaming Command Failed

Hadoop流作业失败 - Hadoop Streaming job failing

Distcp与Hadoop流作业 - Distcp with Hadoop streaming job

Python MapReduce Hadoop Streaming Job需要多个输入文件？ - Python MapReduce Hadoop Streaming Job that requires multiple input files?

无法运行hadoop流作业：缺少必需的选项：输入，输出 - Can not run hadoop streaming job: Missing required options: input, output

在流式hadoop程序中获取输入文件名 - Get input file name in streaming hadoop program

Hadoop MapReduce作业输入文件ClassNotFound - Hadoop MapReduce Job Input File ClassNotFound

Hadoop流作业执行中映射器的“文本文件繁忙”错误 - “Text file busy” error for the mapper in a Hadoop streaming job execution

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 具有二进制输入的Hadoop流作业？ - Hadoop Streaming Job with binary input? Hadoop作业输入文件的位置 - Location of a Hadoop job input file Hadoop错误：启动作业时出错，输入路径错误：文件不存在。流命令失败 - Hadoop Error: Error launching job , bad input path : File does not exist.Streaming Command Failed Hadoop流作业失败 - Hadoop Streaming job failing Distcp与Hadoop流作业 - Distcp with Hadoop streaming job Python MapReduce Hadoop Streaming Job需要多个输入文件？ - Python MapReduce Hadoop Streaming Job that requires multiple input files? 无法运行hadoop流作业：缺少必需的选项：输入，输出 - Can not run hadoop streaming job: Missing required options: input, output 在流式hadoop程序中获取输入文件名 - Get input file name in streaming hadoop program Hadoop MapReduce作业输入文件ClassNotFound - Hadoop MapReduce Job Input File ClassNotFound Hadoop流作业执行中映射器的“文本文件繁忙”错误 - “Text file busy” error for the mapper in a Hadoop streaming job execution

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM