简体   繁体   中英

Map with JavaPairRDD with List values

I am trying to iterator over a JavaPairedRDD with List value. I want to iterate over each entries, but it seems like I am always iterating over all the elements in the list of values. For instance, I have a pairedRDD like this.

[(0,[date, date, date]), (1,[str, str, str]), (2,[str, str, str]), (3,[str, int, str]), (4,[int, int, int]), (5,[float, float, int]), (6,[float, float, float])]

And I want to extract most common element in values for each entries of Pair. So for this one, I want

[date, str, str, str, int, float, float]

How do I do this? I will list few attempts I tried, but they are iterating over all the elements in the value. I defined a function that returns most common element for list and tried this:

JavaRDD<String> resultrdd = pair_rdd.map(e -> mostCommon(e._2));

and this

JavaRDD<String> result = pair_rdd.flatMap(new FlatMapFunction<Tuple2<Integer, List<String>>, String>(){

    @Override
    public Iterator<String> call(Tuple2<Integer, List<String>> t) throws Exception {
            List<String> result = new ArrayList<String>();
            // TODO Auto-generated method stub
            List<String> type = t._2;
            result.add(mostCommon(type));
            return result.iterator();
        }

});

All resulting in all the elements in the list, thus

[date, str, str, str, int, float, float,date, str, int, str, int, float, float,date, str, str, str, int, int, float]

I think e._2 is not referring to the whole list, but each elements of the list. Any help?

Edit : Here is my mostcommon method. If there is "None", it just prefers the other type.

public static <String> String mostCommon(List<String> list) {
    Map<String, Integer> map = new HashMap<>();

    for (String t : list) {
        Integer val = map.get(t);
        map.put(t, val == null ? 1 : val + 1);
    }

    Entry<String, Integer> max = null;

    for (Entry<String, Integer> e : map.entrySet()) {
        if (e.getKey().equals("None")==false) {
            if (max == null || e.getValue() > max.getValue())
                max = e;
        }
    }
    if(max==null) {
        return (String) "None";
    }else {
        return max.getKey();
    }
}

You want map , not flatMap . You're transforming tuples into single elements, not flattening nested lists

And you just need to return mostCommon(t._2) , assuming that returns a single string

I think e._2 is not referring to the whole list,

It has to be the whole list. Otherwise, this doesn't compile

List<String> type = t._2;

Your first 7 elements are correct, so I think you have extra data in your RDD

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM