![](/img/trans.png)
[英]Deleting DataFrame rows in Pandas based on column value - multiple values to remove
[英]Deleting rows in pandas dataframe based on pair value
我有 dataframe 如下:
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
我想刪除 df['Type'] = 101 和 102 的對值,其中 df['Qty'] 相互抵消。 最終結果將是這樣的:
df = pd.DataFrame({'User':['a','b'],
'Type':['101', '101'],
'Qty':[10, 30})
我試圖將負值轉換為絕對數並刪除重復項:
df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')
但后來它錯誤地給了我這樣的 dataframe:
df = pd.DataFrame({'User':['a','b', 'b'],
'Type':['101', '101', '101'],
'Qty':[10, 30, 5})
想法是為每個組創建索引值的組合,並測試每個子組是否同時包含Type
s,並且對於 set ot 這個匹配對,sum 是否為0
:
#solution need unique index values
df = df.reset_index(drop=True)
from itertools import combinations
out = set()
def f(x):
for i in combinations(x.index, 2):
a = x.loc[list(i)]
if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
out.add(i)
df.groupby('User').apply(f)
print (out)
{(0, 1), (4, 5), (1, 2)}
如果重復某些值,則刪除所有對,例如此處(1,2)
:
s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0 0
0 1
1 4
1 5
dtype: object
最后從原始中刪除行:
df = df.drop(final)
print (df)
User Type Qty
2 a 101 10
3 b 101 30
如果只有兩個'Type'
(在本例中為101
和102
) ,那么您可以編寫自定義 function ,如下所示:
'Qty'
絕對值的鍵構建字典。'Qty'
的'Type'
值列表。from collections import defaultdict
def f(x):
new = defaultdict(list)
for k,v in x[['Type', 'Qty']].itertuples(index=None,name=None):
if not new[abs(v)]:
new[abs(v)].append(k)
elif new[abs(v)][-1] !=k:
new[abs(v)].pop()
else:
new[abs(v)].append(k)
return pd.Series(new,name='Qty').rename_axis(index='Type')
邏輯很簡單:
'Type'
添加到列表中。'Type'
是否等於當前'Type'
值。 例如,如果它們都不匹配,如果new = {10:['101']}
並且當前鍵是'102'
則刪除'101'
。 所以, new = {10:[]}
'Type'
和當前'Type'
匹配,只需 append 當前'Type'
到列表中,例如,如果new = {10:['101']}
並且當前'Type'
是'101'
然后 append 到它。 因此, new = {10:['101', '101']}
。df.groupby('User').apply(f).explode().dropna().reset_index()
User Type Qty
0 a 10 101
1 b 30 101
遍歷所有記錄並將匹配項保存在一個列表中,以確保沒有索引多次配對似乎在這里工作。
import pandas as pd
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
# create a list to collect all indices that we are going to remove
records_to_remove = []
# a dictionary to map which group mirrors the other
pair = {'101': '102', '102':'101'}
# let's go over each row one by one,
for i in df.index:
current_record = df.iloc[i]
# if we haven't stored this index already for removal
if i not in records_to_remove:
pair_type = pair[current_record['Type']]
pair_quantity = -1*current_record['Qty']
# search for all possible matches to this row
match_records = df[(df['Type']==pair_type) & (df['Qty']==pair_quantity)]
if match_records.empty:
# if no matches fond move on to the next row
continue
else:
# if a match is found, take the first of such records
first_match_index = match_records.index[0]
if first_match_index not in records_to_remove:
# store the indices in the list to remove only if they're not already present
records_to_remove.append(i)
records_to_remove.append(first_match_index)
df = df.drop(records_to_remove)
Output:
User Type Qty
2 a 101 10
3 b 101 30
看看這是否適合你!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.