简体   繁体   English

根据对值删除 pandas dataframe 中的行

[英]Deleting rows in pandas dataframe based on pair value

I have dataframe as below:我有 dataframe 如下:

df = pd.DataFrame({'User':['a','a','a','b','b','b'],
                 'Type':['101','102','101','101','101','102'],
                 'Qty':[10, -10, 10, 30, 5, -5]})

I want to remove pair value of df['Type'] = 101 and 102 where df['Qty'] net off each other.我想删除 df['Type'] = 101 和 102 的对值,其中 df['Qty'] 相互抵消。 The end result would be as such:最终结果将是这样的:

df = pd.DataFrame({'User':['a','b'],
                     'Type':['101', '101'],
                     'Qty':[10, 30})

I tried to convert the negative value into absolute number and remove duplicates as such:我试图将负值转换为绝对数并删除重复项:

df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')

But then it wrongly give me such dataframe:但后来它错误地给了我这样的 dataframe:

df = pd.DataFrame({'User':['a','b', 'b'],
                     'Type':['101', '101', '101'],
                     'Qty':[10, 30, 5})

Idea is create combinations of index values per groups and test if each subgroup contains both Type s and sum is 0 for set ot this matched pairs:想法是为每个组创建索引值的组合,并测试每个子组是否同时包含Type s,并且对于 set ot 这个匹配对,sum 是否为0

#solution need unique index values
df = df.reset_index(drop=True)

from  itertools import combinations
    
out = set()
def f(x):
    for i in combinations(x.index, 2):
        a = x.loc[list(i)]
        if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
           out.add(i)

df.groupby('User').apply(f)

print (out)
{(0, 1), (4, 5), (1, 2)}

Then remove all pairs if duplicated some value, like here (1,2) :如果重复某些值,则删除所有对,例如此处(1,2)

s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0    0
0    1
1    4
1    5
dtype: object

And last remove rows from original:最后从原始中删除行:

df = df.drop(final)
print (df)
  User Type  Qty
2    a  101   10
3    b  101   30

If there are only two 'Type' s (in this case 101 and 102 ) then you could write a custom function as follows:如果只有两个'Type' (在本例中为101102 ,那么您可以编写自定义 function ,如下所示:

  • Build a dictionary with keys containing absolute values of 'Qty' .使用包含'Qty'绝对值的键构建字典。
  • Values of the dictionary contain a list of 'Type' values corresponding to 'Qty' .字典的值包含对应于'Qty''Type'值列表。
from collections import defaultdict
def f(x):
    new = defaultdict(list)
    for k,v in x[['Type', 'Qty']].itertuples(index=None,name=None):
        if not new[abs(v)]:
            new[abs(v)].append(k)
        elif new[abs(v)][-1] !=k:
            new[abs(v)].pop()
        else:
            new[abs(v)].append(k)
    return pd.Series(new,name='Qty').rename_axis(index='Type')

The logic is simple:逻辑很简单:

  • whenever a new key is encountered add it's corresponding 'Type' to the list.每当遇到新键时,将其对应'Type'添加到列表中。
  • if it's already existing key then check if last value ie 'Type' which was added earlier is equal to current 'Type' value.如果它已经存在,则检查最后一个值,即前面添加'Type'是否等于当前'Type'值。 If they both don't match for example, if new = {10:['101']} and current key is '102' remove '101' .例如,如果它们都不匹配,如果new = {10:['101']}并且当前键是'102'则删除'101' So, new = {10:[]}所以, new = {10:[]}
  • if it's key is already existing and last 'Type' and current 'Type' match, simply append current 'Type' to the list for example, if new = {10:['101']} and the current 'Type' is '101' then append to it.如果它的键已经存在并且最后一个'Type'和当前'Type'匹配,只需 append 当前'Type'到列表中,例如,如果new = {10:['101']}并且当前'Type''101'然后 append 到它。 So, new = {10:['101', '101']} .因此, new = {10:['101', '101']}
df.groupby('User').apply(f).explode().dropna().reset_index()

  User  Type  Qty
0    a    10  101
1    b    30  101

Iterating over all records and saving matches in a list that ensures no index is paired more than once seems to work here.遍历所有记录并将匹配项保存在一个列表中,以确保没有索引多次配对似乎在这里工作。


import pandas as pd

df = pd.DataFrame({'User':['a','a','a','b','b','b'],
                 'Type':['101','102','101','101','101','102'],
                 'Qty':[10, -10, 10, 30, 5, -5]})



# create a list to collect all indices that we are going to remove
records_to_remove = []
# a dictionary to map which group mirrors the other
pair = {'101': '102', '102':'101'}

# let's go over each row one by one,
for i in df.index:
    current_record = df.iloc[i]
    # if we haven't stored this index already for removal
    if i not in records_to_remove:
        pair_type = pair[current_record['Type']]
        pair_quantity = -1*current_record['Qty']
        # search for all possible matches to this row
        match_records = df[(df['Type']==pair_type) & (df['Qty']==pair_quantity)]
        if match_records.empty:
            # if no matches fond move on to the next row
            continue
        else:
            # if a match is found, take the first of such records
            first_match_index = match_records.index[0]
            if first_match_index not in records_to_remove:
                # store the indices in the list to remove only if they're not already present
                records_to_remove.append(i)
                records_to_remove.append(first_match_index)
                
df = df.drop(records_to_remove)

Output: Output:

   User Type  Qty
2     a  101   10
3     b  101   30

See if this works for you!看看这是否适合你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM