It looks RDD is to be removed from Spark.
Announcement: DataFrame-based API is primary API
The RDD-based API is expected to be removed in Spark 3.0
Then, how to implement programs like word count in Spark?
The data you manipulate as tuples using RDD api can be thought of and manipulated as columns/fields in a SQL like manner using DataFrame api.
df.withColumn("word", explode(split(col("lines"), " ")))
.groupBy("word")
.count()
.orderBy(col("count").desc())
.show()
+---------+-----+
| word|count|
+---------+-----+
| foo| 5|
| bar| 2|
| toto| 1|
...
+---------+-----+
Notes:
org.apache.spark.sql.functions
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.