简体   繁体   English

Hadoop循环减速器

[英]Hadoop Looping the Reducer

I am trying to find a way to "loop" my reducer, for example: 我试图找到一种方法来“循环”我的减速器,例如:

for(String document: tempFrequencies.keySet())
{
if(list.get(0).equals(document))
{
testMap.put(key.toString(), DF.format(tfIDF));
}
}
//This allows me to create a hashmap which i plan to write out to context as Filename = key then all of the terms weights = value (a list I can parse out in the next job)

The code currently will run through the entire reduce and give me what I want for list.get(0) but the problem is once it is finished doing that entire reduce I need it to start again for list.get(1) etc. Any ideas on how to loop the reduce phase after it has finished? 当前代码将贯穿整个reduce,并为我提供list.get(0)所需的内容,但问题是一旦完成整个reduce,我需要重新为list.get(1)等启动。关于减少阶段完成后如何循环的想法?

Nest the for loop 嵌套for循环

for(int i = 0; i < number_of_time; i++){
//your code

}

Replace the 0 with i. 用i替换0。

You can use key-tag-value technique. 您可以使用键标记值技术。 In mapper emit (key, 0, value) for list values and (key, 1, value) for documents (?). 在mapper中,对于列表值发出(key,0,value),对于文档(?)发出(key,1,value)。 In reducer values will be grouped by key and tag and sorted by tag for each key. 在reducer中,值将按键和标记分组,并针对每个键按标记排序。 You should write your own grouping comparator (and custom partitioner). 您应该编写自己的分组比较器(和自定义分区程序)。 PS I am using the same techique for graph processing. PS我正在使用相同的技术进行图形处理。 I can provide sample code after weekend. 周末后我可以提供示例代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM