[英]Deleting rows in pandas dataframe based on pair value
I have dataframe as below:我有 dataframe 如下:
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
I want to remove pair value of df['Type'] = 101 and 102 where df['Qty'] net off each other.我想删除 df['Type'] = 101 和 102 的对值,其中 df['Qty'] 相互抵消。 The end result would be as such:
最终结果将是这样的:
df = pd.DataFrame({'User':['a','b'],
'Type':['101', '101'],
'Qty':[10, 30})
I tried to convert the negative value into absolute number and remove duplicates as such:我试图将负值转换为绝对数并删除重复项:
df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')
But then it wrongly give me such dataframe:但后来它错误地给了我这样的 dataframe:
df = pd.DataFrame({'User':['a','b', 'b'],
'Type':['101', '101', '101'],
'Qty':[10, 30, 5})
Idea is create combinations of index values per groups and test if each subgroup contains both Type
s and sum is 0
for set ot this matched pairs:想法是为每个组创建索引值的组合,并测试每个子组是否同时包含
Type
s,并且对于 set ot 这个匹配对,sum 是否为0
:
#solution need unique index values
df = df.reset_index(drop=True)
from itertools import combinations
out = set()
def f(x):
for i in combinations(x.index, 2):
a = x.loc[list(i)]
if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
out.add(i)
df.groupby('User').apply(f)
print (out)
{(0, 1), (4, 5), (1, 2)}
Then remove all pairs if duplicated some value, like here (1,2)
:如果重复某些值,则删除所有对,例如此处
(1,2)
:
s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0 0
0 1
1 4
1 5
dtype: object
And last remove rows from original:最后从原始中删除行:
df = df.drop(final)
print (df)
User Type Qty
2 a 101 10
3 b 101 30
If there are only two 'Type'
s (in this case 101
and 102
) then you could write a custom function as follows:如果只有两个
'Type'
(在本例中为101
和102
) ,那么您可以编写自定义 function ,如下所示:
'Qty'
.'Qty'
绝对值的键构建字典。'Type'
values corresponding to 'Qty'
.'Qty'
的'Type'
值列表。from collections import defaultdict
def f(x):
new = defaultdict(list)
for k,v in x[['Type', 'Qty']].itertuples(index=None,name=None):
if not new[abs(v)]:
new[abs(v)].append(k)
elif new[abs(v)][-1] !=k:
new[abs(v)].pop()
else:
new[abs(v)].append(k)
return pd.Series(new,name='Qty').rename_axis(index='Type')
The logic is simple:逻辑很简单:
'Type'
to the list.'Type'
添加到列表中。'Type'
which was added earlier is equal to current 'Type'
value.'Type'
是否等于当前'Type'
值。 If they both don't match for example, if new = {10:['101']}
and current key is '102'
remove '101'
.new = {10:['101']}
并且当前键是'102'
则删除'101'
。 So, new = {10:[]}
new = {10:[]}
'Type'
and current 'Type'
match, simply append current 'Type'
to the list for example, if new = {10:['101']}
and the current 'Type'
is '101'
then append to it.'Type'
和当前'Type'
匹配,只需 append 当前'Type'
到列表中,例如,如果new = {10:['101']}
并且当前'Type'
是'101'
然后 append 到它。 So, new = {10:['101', '101']}
.new = {10:['101', '101']}
。df.groupby('User').apply(f).explode().dropna().reset_index()
User Type Qty
0 a 10 101
1 b 30 101
Iterating over all records and saving matches in a list that ensures no index is paired more than once seems to work here.遍历所有记录并将匹配项保存在一个列表中,以确保没有索引多次配对似乎在这里工作。
import pandas as pd
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
# create a list to collect all indices that we are going to remove
records_to_remove = []
# a dictionary to map which group mirrors the other
pair = {'101': '102', '102':'101'}
# let's go over each row one by one,
for i in df.index:
current_record = df.iloc[i]
# if we haven't stored this index already for removal
if i not in records_to_remove:
pair_type = pair[current_record['Type']]
pair_quantity = -1*current_record['Qty']
# search for all possible matches to this row
match_records = df[(df['Type']==pair_type) & (df['Qty']==pair_quantity)]
if match_records.empty:
# if no matches fond move on to the next row
continue
else:
# if a match is found, take the first of such records
first_match_index = match_records.index[0]
if first_match_index not in records_to_remove:
# store the indices in the list to remove only if they're not already present
records_to_remove.append(i)
records_to_remove.append(first_match_index)
df = df.drop(records_to_remove)
Output: Output:
User Type Qty
2 a 101 10
3 b 101 30
See if this works for you!看看这是否适合你!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.