从有条件的熊猫数据框中删除行

Question

I have a dataframe that looks like this:我有一个看起来像这样的数据框：

import pandas as pd将熊猫导入为 pd

### create toy data set
data = [[1111,'10/1/2021',21,123],
        [1111,'10/1/2021',-21,123],
        [1111,'10/1/2021',21,123],
        [2222,'10/2/2021',15,234],
        [2222,'10/2/2021',15,234],
        [3333,'10/3/2021',15,234],
        [3333,'10/3/2021',15,234]]

df = pd.DataFrame(data,columns = ['Individual','date','number','cc'])

What I want to do is remove rows where Individual, date, and cc are the same, but number is a negative value in one case and a positive in the other case.我想要做的是删除个人、日期和抄送相同的行，但数字在一种情况下为负值，而在另一种情况下为正值。 For example, in the first three rows, I would remove rows 1 and 2 (because 21 and -21 values are equal in absolute terms), but I don't want to remove row 3 (because I have already accounted for the negative value in row 2 by eliminating row 1).例如，在前三行中，我将删除第 1 行和第 2 行（因为 21 和 -21 值在绝对值上相等），但我不想删除第 3 行（因为我已经考虑了负值在第 2 行中通过消除第 1 行）。 Also, I don't want to remove duplicated values if the corresponding number values are positive.另外，如果相应的数值为正，我不想删除重复的值。 I have tried a variety of duplicated() approaches, but just can't get it right.我尝试了各种重复的（）方法，但就是做对了。

Expected results would be:预期结果是：

  Individual       date  number   cc
0        1111  10/1/2021      21  123
1        2222  10/2/2021      15  234
2        2222  10/2/2021      15  234
3        3333  10/3/2021      15  234
4        3333  10/3/2021      15  234

Thus, the first two rows are removed, but not the third row, since the negative value is already accounted for.因此，前两行被移除，但第三行不会被移除，因为负值已经被考虑在内。

Any assistance would be appreciated.任何援助将不胜感激。 I am trying to do this without a loop, but it may be unavoidable.我试图在没有循环的情况下执行此操作，但这可能是不可避免的。 It seems similar to this question , but I can't figure out how to make it work in my case, as I am trying to avoid loops.这似乎与这个问题相似，但我不知道如何让它在我的情况下工作，因为我试图避免循环。

Answer 1

I can't be sure since you did not post your expected output, but you could try the below.我无法确定，因为您没有发布预期的输出，但您可以尝试以下操作。 Create a separate df called n that contains the rows with -ve 'number' and join it to the original with indicator=True .创建一个名为n的单独 df ，其中包含带有 -ve 'number' 的行，并使用indicator=True将其连接到原始行。

n = df.loc[df.number.le(0)].drop('number',axis=1)
df = pd.merge(df,n,'left',indicator=True)

>>> df

   Individual       date  number   cc     _merge
0        1111  10/1/2021      21  123       both
1        1111  10/1/2021     -21  123       both
2        1111  10/1/2021      21  123       both
3        2222  10/2/2021      15  234  left_only
4        2222  10/2/2021      15  234  left_only
5        3333  10/3/2021      15  234  left_only
6        3333  10/3/2021      15  234  left_only

This will allow us to identify the Individual/date/cc groups that have a -ve 'number' row.这将使我们能够识别具有 -ve 'number' 行的 Individual/date/cc 组。

Then you can locate the rows with 'both' in _merge, and only use those to perform a groupby.head(2) , concatenating that with the rest of the df:然后，您可以在 _merge 中找到带有 'both' 的行，并且仅使用这些行来执行groupby.head(2) ，将其与 df 的其余部分连接起来：

out = pd.concat([df.loc[df._merge.eq('both')].groupby(['Individual','date','cc']).head(2),
           df.loc[df._merge.ne('both')]]).drop('_merge',axis=1)

Which prints:哪个打印：

   Individual       date  number   cc
0        1111  10/1/2021      21  123
1        1111  10/1/2021     -21  123
3        2222  10/2/2021      15  234
4        2222  10/2/2021      15  234
5        3333  10/3/2021      15  234
6        3333  10/3/2021      15  234

从有条件的熊猫数据框中删除行

问题描述

1 个解决方案

解决方案1
0 2021-10-21 18:28:23

从有条件的熊猫数据框中删除行

问题描述

1 个解决方案

解决方案1 0 2021-10-21 18:28:23

解决方案1
0 2021-10-21 18:28:23