I have dataframe as below:
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
I want to remove pair value of df['Type'] = 101 and 102 where df['Qty'] net off each other. The end result would be as such:
df = pd.DataFrame({'User':['a','b'],
'Type':['101', '101'],
'Qty':[10, 30})
I tried to convert the negative value into absolute number and remove duplicates as such:
df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')
But then it wrongly give me such dataframe:
df = pd.DataFrame({'User':['a','b', 'b'],
'Type':['101', '101', '101'],
'Qty':[10, 30, 5})
Idea is create combinations of index values per groups and test if each subgroup contains both Type
s and sum is 0
for set ot this matched pairs:
#solution need unique index values
df = df.reset_index(drop=True)
from itertools import combinations
out = set()
def f(x):
for i in combinations(x.index, 2):
a = x.loc[list(i)]
if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
out.add(i)
df.groupby('User').apply(f)
print (out)
{(0, 1), (4, 5), (1, 2)}
Then remove all pairs if duplicated some value, like here (1,2)
:
s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0 0
0 1
1 4
1 5
dtype: object
And last remove rows from original:
df = df.drop(final)
print (df)
User Type Qty
2 a 101 10
3 b 101 30
If there are only two 'Type'
s (in this case 101
and 102
) then you could write a custom function as follows:
'Qty'
.'Type'
values corresponding to 'Qty'
.from collections import defaultdict
def f(x):
new = defaultdict(list)
for k,v in x[['Type', 'Qty']].itertuples(index=None,name=None):
if not new[abs(v)]:
new[abs(v)].append(k)
elif new[abs(v)][-1] !=k:
new[abs(v)].pop()
else:
new[abs(v)].append(k)
return pd.Series(new,name='Qty').rename_axis(index='Type')
The logic is simple:
'Type'
to the list.'Type'
which was added earlier is equal to current 'Type'
value. If they both don't match for example, if new = {10:['101']}
and current key is '102'
remove '101'
. So, new = {10:[]}
'Type'
and current 'Type'
match, simply append current 'Type'
to the list for example, if new = {10:['101']}
and the current 'Type'
is '101'
then append to it. So, new = {10:['101', '101']}
.df.groupby('User').apply(f).explode().dropna().reset_index()
User Type Qty
0 a 10 101
1 b 30 101
Iterating over all records and saving matches in a list that ensures no index is paired more than once seems to work here.
import pandas as pd
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
# create a list to collect all indices that we are going to remove
records_to_remove = []
# a dictionary to map which group mirrors the other
pair = {'101': '102', '102':'101'}
# let's go over each row one by one,
for i in df.index:
current_record = df.iloc[i]
# if we haven't stored this index already for removal
if i not in records_to_remove:
pair_type = pair[current_record['Type']]
pair_quantity = -1*current_record['Qty']
# search for all possible matches to this row
match_records = df[(df['Type']==pair_type) & (df['Qty']==pair_quantity)]
if match_records.empty:
# if no matches fond move on to the next row
continue
else:
# if a match is found, take the first of such records
first_match_index = match_records.index[0]
if first_match_index not in records_to_remove:
# store the indices in the list to remove only if they're not already present
records_to_remove.append(i)
records_to_remove.append(first_match_index)
df = df.drop(records_to_remove)
Output:
User Type Qty
2 a 101 10
3 b 101 30
See if this works for you!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.