根据对值删除 pandas dataframe 中的行

Question

I have dataframe as below:我有 dataframe 如下：

df = pd.DataFrame({'User':['a','a','a','b','b','b'],
                 'Type':['101','102','101','101','101','102'],
                 'Qty':[10, -10, 10, 30, 5, -5]})

I want to remove pair value of df['Type'] = 101 and 102 where df['Qty'] net off each other.我想删除 df['Type'] = 101 和 102 的对值，其中 df['Qty'] 相互抵消。 The end result would be as such:最终结果将是这样的：

df = pd.DataFrame({'User':['a','b'],
                     'Type':['101', '101'],
                     'Qty':[10, 30})

I tried to convert the negative value into absolute number and remove duplicates as such:我试图将负值转换为绝对数并删除重复项：

df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')

But then it wrongly give me such dataframe:但后来它错误地给了我这样的 dataframe：

df = pd.DataFrame({'User':['a','b', 'b'],
                     'Type':['101', '101', '101'],
                     'Qty':[10, 30, 5})

Answer 1

Idea is create combinations of index values per groups and test if each subgroup contains both Type s and sum is 0 for set ot this matched pairs:想法是为每个组创建索引值的组合，并测试每个子组是否同时包含Type s，并且对于 set ot 这个匹配对，sum 是否为0 ：

#solution need unique index values
df = df.reset_index(drop=True)

from  itertools import combinations
    
out = set()
def f(x):
    for i in combinations(x.index, 2):
        a = x.loc[list(i)]
        if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
           out.add(i)

df.groupby('User').apply(f)

print (out)
{(0, 1), (4, 5), (1, 2)}

Then remove all pairs if duplicated some value, like here (1,2) :如果重复某些值，则删除所有对，例如此处(1,2) ：

s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0    0
0    1
1    4
1    5
dtype: object

And last remove rows from original:最后从原始中删除行：

df = df.drop(final)
print (df)
  User Type  Qty
2    a  101   10
3    b  101   30

Answer 2

If there are only two 'Type' s ^{^{(in this case 101 and 102 )}} then you could write a custom function as follows:如果只有两个'Type' ^{^{（在本例中为101和102 ）}} ，那么您可以编写自定义 function ，如下所示：

Build a dictionary with keys containing absolute values of 'Qty' .使用包含'Qty'绝对值的键构建字典。
Values of the dictionary contain a list of 'Type' values corresponding to 'Qty' .字典的值包含对应于'Qty'的'Type'值列表。

from collections import defaultdict
def f(x):
    new = defaultdict(list)
    for k,v in x[['Type', 'Qty']].itertuples(index=None,name=None):
        if not new[abs(v)]:
            new[abs(v)].append(k)
        elif new[abs(v)][-1] !=k:
            new[abs(v)].pop()
        else:
            new[abs(v)].append(k)
    return pd.Series(new,name='Qty').rename_axis(index='Type')

The logic is simple:逻辑很简单：

whenever a new key is encountered add it's corresponding 'Type' to the list.每当遇到新键时，将其对应'Type'添加到列表中。
if it's already existing key then check if last value ie 'Type' which was added earlier is equal to current 'Type' value.如果它已经存在，则检查最后一个值，即前面添加'Type'是否等于当前'Type'值。 If they both don't match for example, if new = {10:['101']} and current key is '102' remove '101' .例如，如果它们都不匹配，如果new = {10:['101']}并且当前键是'102'则删除'101' 。 So, new = {10:[]}所以， new = {10:[]}
if it's key is already existing and last 'Type' and current 'Type' match, simply append current 'Type' to the list for example, if new = {10:['101']} and the current 'Type' is '101' then append to it.如果它的键已经存在并且最后一个'Type'和当前'Type'匹配，只需 append 当前'Type'到列表中，例如，如果new = {10:['101']}并且当前'Type'是'101'然后 append 到它。 So, new = {10:['101', '101']} .因此， new = {10:['101', '101']} 。

df.groupby('User').apply(f).explode().dropna().reset_index()

  User  Type  Qty
0    a    10  101
1    b    30  101

Answer 3

Iterating over all records and saving matches in a list that ensures no index is paired more than once seems to work here.遍历所有记录并将匹配项保存在一个列表中，以确保没有索引多次配对似乎在这里工作。


import pandas as pd

df = pd.DataFrame({'User':['a','a','a','b','b','b'],
                 'Type':['101','102','101','101','101','102'],
                 'Qty':[10, -10, 10, 30, 5, -5]})



# create a list to collect all indices that we are going to remove
records_to_remove = []
# a dictionary to map which group mirrors the other
pair = {'101': '102', '102':'101'}

# let's go over each row one by one,
for i in df.index:
    current_record = df.iloc[i]
    # if we haven't stored this index already for removal
    if i not in records_to_remove:
        pair_type = pair[current_record['Type']]
        pair_quantity = -1*current_record['Qty']
        # search for all possible matches to this row
        match_records = df[(df['Type']==pair_type) & (df['Qty']==pair_quantity)]
        if match_records.empty:
            # if no matches fond move on to the next row
            continue
        else:
            # if a match is found, take the first of such records
            first_match_index = match_records.index[0]
            if first_match_index not in records_to_remove:
                # store the indices in the list to remove only if they're not already present
                records_to_remove.append(i)
                records_to_remove.append(first_match_index)
                
df = df.drop(records_to_remove)

Output: Output：

   User Type  Qty
2     a  101   10
3     b  101   30

See if this works for you!看看这是否适合你！

根据对值删除 pandas dataframe 中的行

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-07-02 06:25:49

解决方案2
2 2020-07-02 09:18:53

解决方案3
2 2020-07-02 09:22:31

根据对值删除 pandas dataframe 中的行

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-07-02 06:25:49

解决方案2 2 2020-07-02 09:18:53

解决方案3 2 2020-07-02 09:22:31

解决方案1
3 已采纳 2020-07-02 06:25:49

解决方案2
2 2020-07-02 09:18:53

解决方案3
2 2020-07-02 09:22:31