[英]How to steps differences reduce in Hadoop?
How to steps differences reduce in Hadoop?如何在 Hadoop 中减少步骤差异?
I have a problem with understand Hadoop. I have two files and first I did a join between those files.我对理解 Hadoop 有疑问。我有两个文件,首先我在这些文件之间进行了连接。 One file is about countries and the other is about client in each country.
一个文件是关于国家的,另一个是关于每个国家的客户的。
Example, clients.csv:例如,clients.csv:
Bertram Pearcy ,bueno,SO
Steven Ulman ,regular,ZA
Countries.csv Countries.csv
Name,Code
Afghanistan,AF
Ã…land Islands,AX
Albania,AL
…
I did one map reduce that give me how many “good” (bueno) clients have a country (ZA, SO) and with countries.csv I know with country we are talking.我做了一个 map reduce,它告诉我有多少“好”(bueno)客户有一个国家(ZA,SO)和国家。csv 我知道我们正在谈论的国家。
I programmed:我编程:
def steps(self):
# ordenamos las operaciones para su ejecución.
return [
MRStep(mapper=self.mapper
,reducer=self.reducer),
MRStep(mapper=self.mapper1
,combiner=self.combiner_cuenta_palabras
,reducer=self.reducer2
),
]
The result of my map/reduce is:我的 map/reduce 的结果是:
["South Georgia and the South Sandwich Islands"] 1
["South Sudan"] 1
["Spain"] 3
Now, I would like to know which one is the best.现在,我想知道哪一个是最好的。
I added one reduce more.我加了一个减少更多。
def reducer3(self, _, values):
yield _, max (values)
def steps(self):
# ordenamos las operaciones para su ejecución.
return [
MRStep(mapper=self.mapper
,reducer=self.reducer),
MRStep(mapper=self.mapper1
,combiner=self.combiner_cuenta_palabras
,reducer=self.reducer2),
MRStep(#mapper=self.mapper3,
reducer=self.reducer3
#,reducer=self.reducer3
),
]
But I have the same answer than without that reducer但我的答案与没有那个减速器的答案相同
I try to use one map/reduce program adding another reduce.我尝试使用一个 map/reduce 程序添加另一个 reduce。 It that does not work.
它不起作用。
With my first reduce I got:通过我的第一次减少,我得到了:
A, 10
C, 2
D, 5
Now, I would like to use that result I get: A, 10现在,我想使用我得到的结果:A,10
Additional comment:附加评论:
INPUT [Fille1]+[File2] => enter image description here INPUT [Fille1]+[File2] =>在此处输入图像描述
MAP/REDUCE => OUT映射/减少 => 输出
enter image description here在此处输入图像描述
Now, I need that with additional map/reduce ( and I would like to use what I did) get another answers.现在,我需要通过额外的 map/reduce(我想使用我所做的)得到另一个答案。
First) For instance, one and only one answer.第一)例如,一个且唯一的答案。 Example:
3 Spain
示例:
3 Spain
Second) All with the best or bigger number, 3 Spain
and 3 Guan
.第二)所有最好或更大的数字,
3 Spain
和3 Guan
。
I try to use:我尝试使用:
def reducer3(self, _, values):
yield _, max (values)
And I add,我补充说,
def steps(self):
# ordenamos las operaciones para su ejecución.
return [
MRStep(mapper=self.mapper
,reducer=self.reducer),
MRStep(mapper=self.mapper1
,combiner=self.combiner_cuenta_palabras
,reducer=self.reducer2),
MRStep(reducer=self.reducer3
),
]
But I still have the same result.但我仍然有相同的结果。 I Know that REDUCER3 is using because if I write
max(values)+1000
give me the same result but with number 1001
, 1003
我知道 REDUCER3 正在使用,因为如果我写
max(values)+1000
给我相同的结果但数字1001
, 1003
Your reducer is getting 3 distinct keys, therefore you're finding the max of each, and values
only has one element (try printing its length... ).你的 reducer 有 3 个不同的键,因此你找到每个键的最大值,而
values
只有一个元素(尝试打印它的长度......)。 Therefore, you get 3 results.因此,您会得到 3 个结果。
You need a third mapper that returns (None, f'{key}|{value})
for example, then all records will be sent to one reducer , where you can then iterate, parse, and aggregate the results例如,您需要第三个返回
(None, f'{key}|{value})
的映射器,然后所有记录将被发送到一个 reducer ,然后您可以在其中迭代、解析和聚合结果
def reducer3(self, _, values):
_max = float('-inf')
k_out = None
for x in values:
k, v = x.split('|')
if int(v) > _max:
_max = v
k_out = k
yield k_out, _max
That'll only return one result for all values.这只会为所有值返回一个结果。 If you want to capture equal max values, I think you'll need to iterate over the list more than once, then yield within a loop of found max elements
如果你想捕获相等的最大值,我认为你需要多次遍历列表,然后在找到的最大元素的循环中产生
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.