简体   繁体   中英

Spark Scala def with yield

In SO 33655920 I come across the below, fine.

rdd = sc.parallelize([1, 2, 3, 4], 2)
def f(iterator): yield sum(iterator)
rdd.mapPartitions(f).collect()

In Scala, I cannot seem to get the the def in the same shorthand way. The equivalent is? I have searched and tried but to no avail.

Thanks in advance.

If you want to sum values in the partition you can write something like

val rdd = sc.parallelize(1 to 4, 2)
def f(i: Iterator[Int]) = Iterator(i.sum)
rdd.mapPartitions(f).collect()

yield sum(iterator) in Python sums the elements of the iterator. The similar way of doing this in Scala would be:

val rdd = sc.parallelize(Array(1, 2, 3, 4), 2)
rdd.mapPartitions(it => Iterator(it.sum)).collect()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM