Is it possible possible to tell spark drop duplicates to drop the second occurrence instead of first one?
scala> df.show()
+-----------+
| _1|
+-----------+
|1 2 3 4 5 6|
|9 4 5 8 7 7|
|1 2 3 4 5 6|
+-----------+
scala> val newDf = df.dropDuplicates()
newDf: org.apache.spark.sql.DataFrame = [_1: string]
scala> newDf.show()
+-----------+
| _1|
+-----------+
|9 4 5 8 7 7|
|1 2 3 4 5 6|
+-----------+
对行进行排名/索引,具有相同的值,然后删除索引/排名 > 1 的所有记录的条目。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.