[英]I have pd.DataFrame with non-unique values in selected Column. How can i leave only rows with unique values of the selected column?
The example is in the picture.例子在图片中。 How could I drop rows with non-unique values in column 'signal'?
如何删除“信号”列中具有非唯一值的行?
cols = ['signal', 'metabolite', 'adduct', 's_ind', 'm_ind', 'a_ind', 'distance']
data = [[0.500001, 1.000002, -0.5, 1, 1, 2, 0.000001],
[0.500001, 0.000002, 0.5, 1, 2, 1, 0.000001],
[0.500002, 1.000002, -0.5, 2, 1, 2, 0.000000],
[0.500002, 0.000002, 0.5, 2, 2, 1, 0.000000],
[0.500003, 1.000002, -0.5, 3, 1, 2, 0.000001],
[0.500003, 0.000002, 0.5, 3, 2, 1, 0.000001],
[1.000000, 1.000002, -0.5, 4, 1, 2, 0.499998],
[1.000000, 0.000002, 0.5, 4, 2, 1, 0.499998],
[0.000001, 1.000002, -0.5, 5, 1, 2, 0.500001],
[0.000001, 0.000002, 0.5, 5, 2, 1, 0.500001]]
df = pd.DataFrame(data=data, columns=cols)
display(df)
Just call drop_duplicates
and pass the column list to subset
parameter, it will keep only the first non-`unique value (You can pass one or more columns from which you want to drop the non-unique values).只需调用
drop_duplicates
并将列列表传递给subset
参数,它将只保留第一个非唯一值(您可以传递一个或多个要从中删除非唯一值的列)。
df.drop_duplicates(subset=['signal'])
signal metabolite adduct s_ind m_ind a_ind distance
0 0.500001 1.000002 -0.5 1 1 2 0.000001
2 0.500002 1.000002 -0.5 2 1 2 0.000000
4 0.500003 1.000002 -0.5 3 1 2 0.000001
6 1.000000 1.000002 -0.5 4 1 2 0.499998
8 0.000001 1.000002 -0.5 5 1 2 0.500001
You can also pass keep
as False
if you don't want to include the non`-unique values at all.如果您根本不想包含非唯一值,也可以将
keep
作为False
传递。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.