简体   繁体   English

我在选定的列中有具有非唯一值的 pd.DataFrame。 我怎样才能只留下具有所选列的唯一值的行?

[英]I have pd.DataFrame with non-unique values in selected Column. How can i leave only rows with unique values ​of the selected column?

The example is in the picture.例子在图片中。 How could I drop rows with non-unique values in column 'signal'?如何删除“信号”列中具有非唯一值的行?

cols = ['signal', 'metabolite', 'adduct', 's_ind', 'm_ind', 'a_ind', 'distance']
data = [[0.500001, 1.000002, -0.5, 1, 1, 2, 0.000001], 
[0.500001, 0.000002, 0.5, 1, 2, 1, 0.000001], 
[0.500002, 1.000002, -0.5, 2, 1, 2, 0.000000], 
[0.500002, 0.000002, 0.5, 2, 2, 1, 0.000000], 
[0.500003, 1.000002, -0.5, 3, 1, 2, 0.000001], 
[0.500003, 0.000002, 0.5, 3, 2, 1, 0.000001], 
[1.000000, 1.000002, -0.5, 4, 1, 2, 0.499998], 
[1.000000, 0.000002, 0.5, 4, 2, 1, 0.499998], 
[0.000001, 1.000002, -0.5, 5, 1, 2, 0.500001], 
[0.000001, 0.000002, 0.5, 5, 2, 1, 0.500001]]

df = pd.DataFrame(data=data, columns=cols)
display(df)

Just call drop_duplicates and pass the column list to subset parameter, it will keep only the first non-`unique value (You can pass one or more columns from which you want to drop the non-unique values).只需调用drop_duplicates并将列列表传递给subset参数,它将只保留第一个非唯一值(您可以传递一个或多个要从中删除非唯一值的列)。

df.drop_duplicates(subset=['signal'])

     signal  metabolite  adduct  s_ind  m_ind  a_ind  distance
0  0.500001    1.000002    -0.5      1      1      2  0.000001
2  0.500002    1.000002    -0.5      2      1      2  0.000000
4  0.500003    1.000002    -0.5      3      1      2  0.000001
6  1.000000    1.000002    -0.5      4      1      2  0.499998
8  0.000001    1.000002    -0.5      5      1      2  0.500001

You can also pass keep as False if you don't want to include the non`-unique values at all.如果您根本不想包含非唯一值,也可以将keep作为False传递。

You're looking for pd.drop_duplicates() .您正在寻找pd.drop_duplicates() See here :这里

df = df.drop_duplicates("signal")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查找与另一个数据帧中的列具有相同非唯一列值的数据帧行 - Find rows of a dataframe that have same non-unique column values as a column in another dataframe 如何修改pd.dataframe中的列值 - How to modify the column values in pd.dataframe 使用基于(非唯一)列值的其他行中的值替换 DataFrame 行中的 NaN 值 - Replacing NaN values in a DataFrame row with values from other rows based on a (non-unique) column value 如何使用非唯一列将具有求和值的熊猫Groupby数据框映射到另一个数据框 - How to map pandas Groupby dataframe with sum values to another dataframe using non-unique column 如何查询联接列上具有唯一值的行? - How can I query rows with unique values on a joined column? 在给定列中删除具有唯一元素的pandas dataFrame行。 (独特的意思是重复一次) - Drop rows of a pandas dataFrame with unique elements in a given column. (by unique I mean repeated once) 在python中按日期和总和值对非唯一日期时间列进行分组 - Group non-unique datetime column by date and sum values in python 比较具有不同长度的非唯一索引的数据帧的列值 - compare column values of dataframes with non-unique indices of different length 提取具有非唯一索引列日期的 Dask dataframe 中的最新值 - Extracting latest values in a Dask dataframe with non-unique index column dates 合并 2 个 pandas 数据帧到一个非唯一但有条件选择非唯一值的列上(技术上是唯一的) - Merging 2 pandas dataframes on a column that is non unique but has conditions on selecting the non-unique values (technically then unique)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM