简体   繁体   English

根据另一列(Python,Pandas)中的值删除一列的重复项

[英]Drop duplicates of one column based on value in another column, Python, Pandas

I have a dataframe like this: 我有一个这样的数据框:

Date                PlumeO      Distance
2014-08-13 13:48:00  754.447905 5.844577 
2014-08-13 13:48:00  754.447905 6.888653
2014-08-13 13:48:00  754.447905 6.938860
2014-08-13 13:48:00  754.447905 6.977284
2014-08-13 13:48:00  754.447905 6.946430 
2014-08-13 13:48:00  754.447905 6.345506
2014-08-13 13:48:00  754.447905 6.133567
2014-08-13 13:48:00  754.447905 5.846046 
2014-08-13 16:59:00  754.447905 6.345506 
2014-08-13 16:59:00  754.447905 6.694847 
2014-08-13 16:59:00  754.447905 5.846046 
2014-08-13 16:59:00  754.447905 6.977284 
2014-08-13 16:59:00  754.447905 6.938860 
2014-08-13 16:59:00  754.447905 5.844577 
2014-08-13 16:59:00  754.447905 6.888653 
2014-08-13 16:59:00  754.447905 6.133567 
2014-08-13 16:59:00  754.447905 6.946430

I'm trying to keep the date with the smallest distance, so drop the duplicates dates and keep the with the smallest distance. 我试图将日期保持在最短的距离,因此请删除重复的日期,并将日期保持在最短的距离。

Is there a way to achieve this in pandas' df.drop_duplicates or am I stuck using if statements to find the smallest distance? 有没有办法在熊猫的df.drop_duplicates实现此df.drop_duplicates或者我是否坚持使用if语句查找最小距离?

Sort by distances and drop by dates: 按距离排序并按日期排序:

df.sort_values('Distance').drop_duplicates(subset='Date', keep='first')
Out: 
                   Date      PlumeO  Distance
0   2014-08-13 13:48:00  754.447905  5.844577
13  2014-08-13 16:59:00  754.447905  5.844577

The advantage of these approaches is that it does not require a sort. 这些方法的优点是不需要排序。

Option 1 选项1
You can identify the index values for the minimum values with idxmin and you can use it within a groupby . 您可以使用idxmin标识最小值的索引值,并且可以在groupby使用它。 Use these results to slice your dataframe. 使用这些结果来切片数据框。

df.loc[df.groupby('Date').Distance.idxmin()]

                   Date      PlumeO  Distance
0   2014-08-13 13:48:00  754.447905  5.844577
13  2014-08-13 16:59:00  754.447905  5.844577

Option 2 选项2
You can use pd.DataFrame.nsmallest to return the rows associated with the smallest distance. 您可以使用pd.DataFrame.nsmallest返回与最小距离关联的行。

df.groupby('Date', group_keys=False).apply(
    pd.DataFrame.nsmallest, n=1, columns='Distance'
)

                   Date      PlumeO  Distance
0   2014-08-13 13:48:00  754.447905  5.844577
13  2014-08-13 16:59:00  754.447905  5.844577

我会说先对数据排序,然后删除重复的日期:

stripped_data = df.sort_values('distance').drop_duplicates('date', keep='first')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一列的重复项删除一列的重复项,将另一列重复项保留在 pandas - drop duplicates of one column based on duplicates of another column keeping the other column duplicates in pandas Pandas dataframe 删除基于另一列值的重复项 - Pandas dataframe drop duplicates based in another column value Pandas/Python:根据另一列中的值设置一列的值 - Pandas/Python: Set value of one column based on value in another column Pandas 删除一列中的重复项,只保留另一列中具有最频繁值的行 - Pandas drop duplicates on one column and keep only rows with the most frequent value in another column Python Pandas根据一列中的值创建新列,而另一列中为空白 - Python Pandas new column based on value in one column and blank in another 如何根据 DataFrame Python Pandas 中其他 2 列中的值删除一列中的重复项? - How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas? 将Pandas数据帧分组一列,根据另一列删除行 - Group Pandas dataframe by one column, drop rows based on another column 基于列条件删除重复 Pandas - Drop Duplicates Based On Column Conditional Pandas pandas - 在一列中删除重复项,计算重复项的数量并聚合一列 - pandas - drop duplicates in a column, count the number of duplicates and aggregate one column 熊猫使用条件删除一列的重复项 - pandas drop duplicates of one column with criteria
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM