[英]how to write hadoop map reduce programs in scala
I am writing a map reduce application scala. 我正在写一个map reduce应用程序scala。 Till map function everything works fine. 直到地图功能一切正常。 But while writing the reducer I am facing problem. 但是在编写减速器时我遇到了问题。
override def reduce(key: Text, values: java.lang.Iterable[Text],
context: ReducerContext) {
}
The ReducerContext
is defined such that is refers to the context inner class, so I am fine here. ReducerContext
的定义是引用上下文内部类的,因此我在这里很好。
The issue is with the Iterable
(Java) component.I am not able to iterate through it. 问题在于Iterable
(Java)组件。我无法对其进行迭代。 I understand that first I have convert it into a scala Iterable
and then iterate over it, I also did that but still didnt get get result. 我知道我首先将其转换为Scala Iterable
,然后对其进行迭代,我也这样做了,但仍然没有得到结果。
I have tried both scala.collection.JavaConverters._ and JavaConversions._ here are few scanarios that didnt work out 我已经尝试了scala.collection.JavaConverters._和JavaConversions._,这是一些无法解决的问题
val jit: java.util.Iterator[Text]= values.iterator()
val abc = JavaConversions.asScalaIterator(jit) /// val abc=jit.asScala
println("size "+ abc.size)// it displays proper size
for(temp <- abc){
///it dosent come inside this loop
}
Similarly I have tried converting this Iterator into list/array but all in vain. 同样,我尝试将此Iterator转换为列表/数组,但都没有用。 Once i convert it into list/arrray(toList/tiArray) the size of the resulting list/array becomes 0. no matter what I do m not able to iterate thorough 一旦将其转换为list / arrray(toList / tiArray),结果列表/数组的大小将变为0。无论我做什么,我都无法彻底迭代
I appreciate any help on this. 我对此表示感谢。
Thanks 谢谢
You can import JavaConversions
to convert Iterable
automatically. 您可以导入JavaConversions
以自动转换Iterable
。
import scala.collection.JavaConversions._
If you still have any problem, can you paste your codes? 如果仍有问题,可以粘贴代码吗?
The tricky thing of values
which you receive in the reduce
is that it only can be traversed once. 在reduce
收到的values
的棘手的事情是它只能被遍历一次。 abc.size
will traverse the values
. abc.size
将遍历values
。 After that, values
is invalid. 此后, values
无效。
So the correct code should be 所以正确的代码应该是
// don't use values
for(value <- values) {
// do something
val v = value.toString
// Don't save value, it will be reused. The content of value will be changed but the reference is same.
}
// don't use values
Just like I mentioned in the comment, the type of value
is Text
. 就像我在评论中提到的那样, value
的类型是Text
。 When you traverse values
, the content of value
will be changed, but the reference is same. 当你穿越values
,含量value
将被改变,但参考是一样的。 So don't try to save value
in a Collection
, or you will get a Collection
that all of the items are same. 因此,请勿尝试在Collection
保存value
,否则您将获得Collection
所有项目都是相同的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.