简体   繁体   中英

Apache Spark Convert collection of RDD to single RDD JAVA

I have following RDD in my Java Code.

(1, List(1596, 1617, 1929, 2399, 2674))
(2, List(1702, 1785, 1933, 2054, 2583, 2913))
(3, List(1982, 2002, 2048, 2341, 2666))

What I am trying to do is to create another RDD. The contents should look like this.(not necessarily in same order)

1596
1617
1929
2399
2674
1702
1785
1933
2054
2583
2913
1982
2002
2048
2341
2666

I am not sure how do transform one RDD (JavaRDD<ArrayList<String>>) with collection of Objects to single RDD (JavaRDD<String>) with all objects in it. I would highly appreciate if anyone could point me to some JAVA resource.

You can do the same in scala as follows

val data = List((1, List(1596, 1617, 1929, 2399, 2674)),
    (2, List(1702, 1785, 1933, 2054, 2583, 2913)),
    (3, List(1982, 2002, 2048, 2341, 2666)))

val rdd_data = sc.parallelize(data)
val rdd_flattened = rdd_data.flatMap((index, value) => value)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM