简体   繁体   中英

Drop duplicated rows where all column are same except one in pandas

I have seen similar questions but nothing answer mine. For example, I have a pandas data frame where the columns are 'A', 'B', 'C', 'D' and 'E'. First, I want to keep the rows if any of the 'A', 'B', 'C' and 'D' columns has different value. Also, if all the columns except 'E' is same, then I would like to keep the row where E is largest and drop the other rows. For instance we have 2(or more rows) where all 'A', 'B', 'C', 'D' columns are same but E is 10 for one and 12 for another row. So will keep the row that include 12 and drop the other one.

df = pd.DataFrame(np.random.randint(1,3,size=(10, 5)), columns=list('ABCDE'))
df

Out[3]: 
   A  B  C  D  E
0  2  2  1  2  2
1  1  2  1  2  2
2  2  1  2  1  2
3  1  2  1  1  1
4  1  2  1  2  2
5  1  2  2  1  1
6  2  2  2  2  2
7  1  1  1  2  2
8  2  1  1  2  2
9  1  1  1  2  1

# sort by column 'E', largest to smallest
df.sort_values(by=['E'], ascending=False)

Out[4]: 
   A  B  C  D  E
0  2  2  1  2  2
1  1  2  1  2  2
2  2  1  2  1  2
4  1  2  1  2  2
6  2  2  2  2  2
7  1  1  1  2  2
8  2  1  1  2  2
3  1  2  1  1  1
5  1  2  2  1  1
9  1  1  1  2  1

# drop all duplicate rows, using columns 'A', 'B', 'C', and 'D'
df.drop_duplicates(subset=['A', 'B', 'C', 'D'], keep='first')

Out[5]: 
   A  B  C  D  E
0  2  2  1  2  2
1  1  2  1  2  2
2  2  1  2  1  2
6  2  2  2  2  2
7  1  1  1  2  2
8  2  1  1  2  2
3  1  2  1  1  1
5  1  2  2  1  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM