简体   繁体   中英

I have pd.DataFrame with non-unique values in selected Column. How can i leave only rows with unique values ​of the selected column?

The example is in the picture. How could I drop rows with non-unique values in column 'signal'?

cols = ['signal', 'metabolite', 'adduct', 's_ind', 'm_ind', 'a_ind', 'distance']
data = [[0.500001, 1.000002, -0.5, 1, 1, 2, 0.000001], 
[0.500001, 0.000002, 0.5, 1, 2, 1, 0.000001], 
[0.500002, 1.000002, -0.5, 2, 1, 2, 0.000000], 
[0.500002, 0.000002, 0.5, 2, 2, 1, 0.000000], 
[0.500003, 1.000002, -0.5, 3, 1, 2, 0.000001], 
[0.500003, 0.000002, 0.5, 3, 2, 1, 0.000001], 
[1.000000, 1.000002, -0.5, 4, 1, 2, 0.499998], 
[1.000000, 0.000002, 0.5, 4, 2, 1, 0.499998], 
[0.000001, 1.000002, -0.5, 5, 1, 2, 0.500001], 
[0.000001, 0.000002, 0.5, 5, 2, 1, 0.500001]]

df = pd.DataFrame(data=data, columns=cols)
display(df)

Just call drop_duplicates and pass the column list to subset parameter, it will keep only the first non-`unique value (You can pass one or more columns from which you want to drop the non-unique values).

df.drop_duplicates(subset=['signal'])

     signal  metabolite  adduct  s_ind  m_ind  a_ind  distance
0  0.500001    1.000002    -0.5      1      1      2  0.000001
2  0.500002    1.000002    -0.5      2      1      2  0.000000
4  0.500003    1.000002    -0.5      3      1      2  0.000001
6  1.000000    1.000002    -0.5      4      1      2  0.499998
8  0.000001    1.000002    -0.5      5      1      2  0.500001

You can also pass keep as False if you don't want to include the non`-unique values at all.

You're looking for pd.drop_duplicates() . See here :

df = df.drop_duplicates("signal")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM