简体   繁体   English

如何在scala中编写hadoop map reduce程序

[英]how to write hadoop map reduce programs in scala

I am writing a map reduce application scala. 我正在写一个map reduce应用程序scala。 Till map function everything works fine. 直到地图功能一切正常。 But while writing the reducer I am facing problem. 但是在编写减速器时我遇到了问题。

override def reduce(key: Text, values: java.lang.Iterable[Text], 
                    context: ReducerContext) {
}

The ReducerContext is defined such that is refers to the context inner class, so I am fine here. ReducerContext的定义是引用上下文内部类的,因此我在这里很好。

The issue is with the Iterable (Java) component.I am not able to iterate through it. 问题在于Iterable (Java)组件。我无法对其进行迭代。 I understand that first I have convert it into a scala Iterable and then iterate over it, I also did that but still didnt get get result. 我知道我首先将其转换为Scala Iterable ,然后对其进行迭代,我也这样做了,但仍然没有得到结果。

I have tried both scala.collection.JavaConverters._ and JavaConversions._ here are few scanarios that didnt work out 我已经尝试了scala.collection.JavaConverters._和JavaConversions._,这是一些无法解决的问题

val jit: java.util.Iterator[Text]= values.iterator()
val abc = JavaConversions.asScalaIterator(jit) /// val abc=jit.asScala
println("size "+ abc.size)// it displays proper size
for(temp <- abc){
///it dosent come inside this loop
}

Similarly I have tried converting this Iterator into list/array but all in vain. 同样,我尝试将此Iterator转换为列表/数组,但都没有用。 Once i convert it into list/arrray(toList/tiArray) the size of the resulting list/array becomes 0. no matter what I do m not able to iterate thorough 一旦将其转换为list / arrray(toList / tiArray),结果列表/数组的大小将变为0。无论我做什么,我都无法彻底迭代

I appreciate any help on this. 我对此表示感谢。

Thanks 谢谢

You can import JavaConversions to convert Iterable automatically. 您可以导入JavaConversions以自动转换Iterable

import scala.collection.JavaConversions._

If you still have any problem, can you paste your codes? 如果仍有问题,可以粘贴代码吗?

The tricky thing of values which you receive in the reduce is that it only can be traversed once. reduce收到的values的棘手的事情是它只能被遍历一次。 abc.size will traverse the values . abc.size将遍历values After that, values is invalid. 此后, values无效。

So the correct code should be 所以正确的代码应该是

// don't use values
for(value <- values) {
    // do something
    val v = value.toString
    // Don't save value, it will be reused. The content of value will be changed but the reference is same.
}
// don't use values

Just like I mentioned in the comment, the type of value is Text . 就像我在评论中提到的那样, value的类型是Text When you traverse values , the content of value will be changed, but the reference is same. 当你穿越values ,含量value将被改变,但参考是一样的。 So don't try to save value in a Collection , or you will get a Collection that all of the items are same. 因此,请勿尝试在Collection保存value ,否则您将获得Collection所有项目都是相同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM