简体繁体中英

Spark dataframe drop duplicates

原文 2016-06-02 07:24:06 1 1 scala/ apache-spark/ dataframe/ duplicates

Is it possible possible to tell spark drop duplicates to drop the second occurrence instead of first one?

scala> df.show()
+-----------+
|         _1|
+-----------+
|1 2 3 4 5 6|
|9 4 5 8 7 7|
|1 2 3 4 5 6|
+-----------+


scala> val newDf = df.dropDuplicates()
newDf: org.apache.spark.sql.DataFrame = [_1: string]

scala> newDf.show()
+-----------+                                                                   
|         _1|
+-----------+
|9 4 5 8 7 7|
|1 2 3 4 5 6|
+-----------+

1 answers

对行进行排名/索引，具有相同的值，然后删除索引/排名 > 1 的所有记录的条目。

Drop duplicates except null in spark

Drop duplicates from tuple Dateset in Spark

Drop first row of Spark DataFrame

Spark DataFrame - drop null values from column

Drop list of Column from a single dataframe in spark

Join smail Dataframe with large in spark scala, removing duplicates with feature selection

How to Drop a Duplicates Column in dataframe while retaining the Unique Column including the only one column among duplicates

Spark Scala - drop the first element from the array in dataframe

drop last column of a dataframe using spark-scala

How to drop specific column and then select all columns from spark dataframe

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Drop duplicates except null in spark Drop duplicates from tuple Dateset in Spark Drop first row of Spark DataFrame Spark DataFrame - drop null values from column Drop list of Column from a single dataframe in spark Join smail Dataframe with large in spark scala, removing duplicates with feature selection How to Drop a Duplicates Column in dataframe while retaining the Unique Column including the only one column among duplicates Spark Scala - drop the first element from the array in dataframe drop last column of a dataframe using spark-scala How to drop specific column and then select all columns from spark dataframe

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM