简体   繁体   中英

how to write hadoop map reduce programs in scala

I am writing a map reduce application scala. Till map function everything works fine. But while writing the reducer I am facing problem.

override def reduce(key: Text, values: java.lang.Iterable[Text], 
                    context: ReducerContext) {
}

The ReducerContext is defined such that is refers to the context inner class, so I am fine here.

The issue is with the Iterable (Java) component.I am not able to iterate through it. I understand that first I have convert it into a scala Iterable and then iterate over it, I also did that but still didnt get get result.

I have tried both scala.collection.JavaConverters._ and JavaConversions._ here are few scanarios that didnt work out

val jit: java.util.Iterator[Text]= values.iterator()
val abc = JavaConversions.asScalaIterator(jit) /// val abc=jit.asScala
println("size "+ abc.size)// it displays proper size
for(temp <- abc){
///it dosent come inside this loop
}

Similarly I have tried converting this Iterator into list/array but all in vain. Once i convert it into list/arrray(toList/tiArray) the size of the resulting list/array becomes 0. no matter what I do m not able to iterate thorough

I appreciate any help on this.

Thanks

You can import JavaConversions to convert Iterable automatically.

import scala.collection.JavaConversions._

If you still have any problem, can you paste your codes?

The tricky thing of values which you receive in the reduce is that it only can be traversed once. abc.size will traverse the values . After that, values is invalid.

So the correct code should be

// don't use values
for(value <- values) {
    // do something
    val v = value.toString
    // Don't save value, it will be reused. The content of value will be changed but the reference is same.
}
// don't use values

Just like I mentioned in the comment, the type of value is Text . When you traverse values , the content of value will be changed, but the reference is same. So don't try to save value in a Collection , or you will get a Collection that all of the items are same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM