简体   繁体   English

可以在Mapper中使用Configuration.set吗?

[英]can Configuration.set be used in the Mapper?

I am trying to save some data from the Mapper to the Job/Main so that I can use it in other jobs. 我试图将某些数据从“映射器”保存到作业/主文件,以便可以在其他作业中使用它。

I tried to use a static variable in my main class (that contains the main function) but when the Mapper adds data to the static variable and I try to print the variable when the job is done I find that there is no new data, it's like the Mapper modified another instance of that static variable.. 我试图在我的主类(包含主函数)中使用静态变量,但是当Mapper将数据添加到静态变量中时,当我完成工作后尝试打印该变量时,我发现没有新数据,它是就像Mapper修改了该静态变量的另一个实例。

Now i'm trying to use the Configuration to set the data from the Mapper: 现在,我正在尝试使用配置来设置映射器中的数据:

Mapper 映射器

context.getConfiguration().set("3", "somedata");

Main 主要

boolean step1Completed = step1.waitForCompletion(true);
System.out.println(step1.getConfiguration().get("3"));

Unfortunately this prints null . 不幸的是,此打印为null

Is there another way to do things? 还有另一种做事的方法吗? I am trying to save some data so that I use it in other jobs and I find using a file just for that a bit extreme since the data is only an index of int,string to map some titles that I will need in my last job. 我正在尝试保存一些数据,以便在其他作业中使用它,但我发现使用文件的确有点极端,因为数据只是int,string的索引int,string以映射我上一份工作中需要的某些标题。

It is not possible as soon as I know. 据我所知,这是不可能的。 Mappers and Reducers work independently in distributed fashion. 映射器和简化器以分布式方式独立工作。 Each task has its own local conf instance. 每个任务都有其自己的本地conf实例。 You have to persist data to HDFS while each job is independent. 每个作业独立时,您必须将数据持久保存到HDFS。

You can also take advantage of MapReduce Chaining mechanism( example ) to run a chain of jobs. 您还可以利用MapReduce链接机制( example )来运行作业链。 In addition, you can design workflow in Azkaban, Oozie and etc to pass output to another job. 此外,您可以在Azkaban,Oozie等中设计工作流程,以将输出传递给另一个作业。

It is indeed not possible since the configuration goes from the job to the mapper/reducer and not the other way around. 确实是不可能的,因为配置是从作业到映射器/缩小器,而不是相反。 I ended up just reading the file directly from the HDFS in my last job's setup. 我最后只是在上一份作业的设置中直接从HDFS读取了文件。

Thank you all for the input. 谢谢大家的投入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM