Hadoop循环减速器

Question

I am trying to find a way to "loop" my reducer, for example: 我试图找到一种方法来“循环”我的减速器，例如：

for(String document: tempFrequencies.keySet())
{
if(list.get(0).equals(document))
{
testMap.put(key.toString(), DF.format(tfIDF));
}
}
//This allows me to create a hashmap which i plan to write out to context as Filename = key then all of the terms weights = value (a list I can parse out in the next job)

The code currently will run through the entire reduce and give me what I want for list.get(0) but the problem is once it is finished doing that entire reduce I need it to start again for list.get(1) etc. Any ideas on how to loop the reduce phase after it has finished? 当前代码将贯穿整个reduce，并为我提供list.get（0）所需的内容，但问题是一旦完成整个reduce，我需要重新为list.get（1）等启动。关于减少阶段完成后如何循环的想法？

Answer 1

Nest the for loop 嵌套for循环

for(int i = 0; i < number_of_time; i++){
//your code

}

Replace the 0 with i. 用i替换0。

Answer 2

You can use key-tag-value technique. 您可以使用键标记值技术。 In mapper emit (key, 0, value) for list values and (key, 1, value) for documents (?). 在mapper中，对于列表值发出（key，0，value），对于文档（？）发出（key，1，value）。 In reducer values will be grouped by key and tag and sorted by tag for each key. 在reducer中，值将按键和标记分组，并针对每个键按标记排序。 You should write your own grouping comparator (and custom partitioner). 您应该编写自己的分组比较器（和自定义分区程序）。 PS I am using the same techique for graph processing. PS我正在使用相同的技术进行图形处理。 I can provide sample code after weekend. 周末后我可以提供示例代码。

Hadoop循环减速器

问题描述

2 个解决方案

解决方案1
0 2011-09-02 20:11:16

解决方案2
0 2011-09-02 20:56:25

Hadoop循环减速器

问题描述

2 个解决方案

解决方案1 0 2011-09-02 20:11:16

解决方案2 0 2011-09-02 20:56:25

解决方案1
0 2011-09-02 20:11:16

解决方案2
0 2011-09-02 20:56:25