The example is in the picture. How could I drop rows with non-unique values in column 'signal'?
cols = ['signal', 'metabolite', 'adduct', 's_ind', 'm_ind', 'a_ind', 'distance']
data = [[0.500001, 1.000002, -0.5, 1, 1, 2, 0.000001],
[0.500001, 0.000002, 0.5, 1, 2, 1, 0.000001],
[0.500002, 1.000002, -0.5, 2, 1, 2, 0.000000],
[0.500002, 0.000002, 0.5, 2, 2, 1, 0.000000],
[0.500003, 1.000002, -0.5, 3, 1, 2, 0.000001],
[0.500003, 0.000002, 0.5, 3, 2, 1, 0.000001],
[1.000000, 1.000002, -0.5, 4, 1, 2, 0.499998],
[1.000000, 0.000002, 0.5, 4, 2, 1, 0.499998],
[0.000001, 1.000002, -0.5, 5, 1, 2, 0.500001],
[0.000001, 0.000002, 0.5, 5, 2, 1, 0.500001]]
df = pd.DataFrame(data=data, columns=cols)
display(df)
Just call drop_duplicates
and pass the column list to subset
parameter, it will keep only the first non-`unique value (You can pass one or more columns from which you want to drop the non-unique values).
df.drop_duplicates(subset=['signal'])
signal metabolite adduct s_ind m_ind a_ind distance
0 0.500001 1.000002 -0.5 1 1 2 0.000001
2 0.500002 1.000002 -0.5 2 1 2 0.000000
4 0.500003 1.000002 -0.5 3 1 2 0.000001
6 1.000000 1.000002 -0.5 4 1 2 0.499998
8 0.000001 1.000002 -0.5 5 1 2 0.500001
You can also pass keep
as False
if you don't want to include the non`-unique values at all.
You're looking for pd.drop_duplicates()
. See here :
df = df.drop_duplicates("signal")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.