简体   繁体   English

如何让每个Mapper类在hadoop中读取相同的一个文件

[英]How to let each mapper class to read the same one file in hadoop

In my hadoop job, except my input data files, I want each mapper class(the map method) to read a common file which I put in the hdfs. 在我的hadoop作业中,除了我的输入数据文件外,我希望每个mapper类(map方法)读取放入hdfs中的公用文件。 This file will be read into each mapper, and save the content in each mapper. 该文件将被读取到每个映射器中,并将内容保存在每个映射器中。 So how to do it? 那怎么办呢?

Depending on your needs there are different approaches: 根据您的需求,有不同的方法:

  • Read the file directly from HDFS in each mapper. 在每个映射器中直接从HDFS读取文件。 This is only recommended when the common file is realatively small. 仅当公用文件过小时才建议这样做。
  • Use CompositeInputFormat to read multiple files at once in each mapper at perform a so called map-side-join. 使用CompositeInputFormat在执行所谓的map-side-join时一次读取每个映射器中的多个文件。 Both files will be splitted and partitioned the same way. 这两个文件将以相同的方式拆分和分区。
  • Add the file to a DistributedCache during job setup. 在作业设置过程中将文件添加到DistributedCache The file will be stored on every node an can accessed by all mappers. 该文件将存储在所有映射器都可以访问的每个节点上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM