I'm very new in Apache Spark. I need a Java solution for the problem below:
JavaPairRDD: JavaRDD: Desired Output:
1,USA France 2,England
2,Engand England 3,France
3,France
4,Italy
Edit: Frankly, I have no idea about what I can try. Like I said, I'm very very newbie at spark. I just thought I can use a method something like instersection. But it requires another JavaPairRDD object. I think the filter method won't work for this problem. For example,
Function<Tuple2<String, String>, Boolean> myFilter =
new Function<Tuple2<String, String>, Boolean>() {
public Boolean call(Tuple2<String, String> keyValue)
{
return ("some boolean expression");
}
};
myPairRDD.filter(myFilter);
I have no idea what kind of boolean expression I can write instead of "some boolean expression" in above function. Sorry for my English by the way.
There are at least three options:
JavaRDD
to JavaPairRDD
with arbitrary value, join
and map
to drop dummy values JavaRDD
is small, collect
distinct
values, convert to Set
, broadcast
and use it to filter
JavaPairRDD
RDDs
to DataFrames
and use inner join followed by drop
/ select
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.