简体   繁体   English

HADOOP-生成为映射器输出的输出文件数

[英]HADOOP - number of output files produced as mapper output

i want to know how many files will be produced if only a single mapper ( no reducer,no combiner etc ) is run for all file splits . 我想知道如果对所有文件分割仅运行一个映射器(没有reducer,no Combiner等),将产生多少个文件。

example- if there are 4 file splits . 示例-如果有4个文件拆分。 then there is single mapper that will process all file splits. 那么只有一个映射器可以处理所有文件拆分。 how many files as mapper output ?? 映射器输出多少文件? -> one or four ->一四个

Each map task will produce one output file. 每个地图任务将产生一个输出文件。 If you have one file on HDFS that is split into four blocks, you will get four output files from a Map-Only job. 如果HDFS上有一个文件,该文件分为四个块,则将从“仅地图”作业中获取四个输出文件。 If the input file is not in a splittable format, like GZip, it will be combined and only one mapper will act on it, outputting one file. 如果输入文件不是可拆分格式(例如GZip),它将被合并,只有一个映射器将对其起作用,输出一个文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM