[英]Python - Pandas 2 dataframe calculation based on column value
Input is 2 pandas Dataframe df1 & df2输入为 2 pandas Dataframe df1 & df2
df1 df1
Document No Amount
6 8138000628REV 0.00
9 8138000602REV 0.00
24 8138000607REV 310.00
11 8138000605REV 0.00
14 813800602REV 0.00
45 8138000525AREV 0.00
84 8138000861REV 200000.00
87 8138000748REV -80770.82
df2 df2
Document No Amount
2 8138000628 0.00
5 8138000602 0.00
12 8138000605 0.00
16 813800602 0.00
42 8138000525A 0.00
80 8138000861 215208.00
85 8138000748 80770.82
Required Output is based on "Document No".所需的 Output 基于“文档编号”。 For each "Document No" in df1 if "Document No" not present in df2 then make it the part of df3.对于 df1 中的每个“文档编号”,如果 df2 中不存在“文档编号”,则使其成为 df3 的一部分。 If "Document No" is present in df2 and Amount is different in df1, df2 then make it the part of df3 with "Document No" without "REV" keyword from df2 and amount will be the subtraction如果 df2 中存在“文档编号”并且 df1、df2 中的金额不同,则使用 df2 中没有“REV”关键字的“文档编号”使其成为 df3 的一部分,金额将是减法
df3 df3
Document No Amount
24 8138000607 310.00
84 8138000861 15208.00 -->(215208.00-200000.00)
So far i have tried to achieve my target using dictionary and list using below code snippet and i am able to get the result but I am assuming Pandas does have some great capability to achieve the same with less no of lines of codes.到目前为止,我已经尝试使用字典和列表使用下面的代码片段来实现我的目标,并且我能够得到结果,但我假设 Pandas 确实有一些强大的能力来实现相同的目标,而代码行数更少。 I am not so well versed with Pandas if someone can give me some hint and show me the path to achieve the same using pandas only.如果有人可以给我一些提示并向我展示仅使用 pandas 实现相同目标的路径,我对 Pandas 不太熟悉。
%%time
import pandas as pd
Path_M='somepath'
df_led = pd.read_excel(Path_M + 'ABC Ltd_ recon.xlsx',
usecols = ['Document No','Remaining Amount'],
sheet_name='Ledger')
df_led['combined']=df_led.values.tolist()
list1 = df_led['combined'].tolist()
thisdict_pir={}
for item in list1:
ll_pir=[]
key = item[0]
key=str(key)
if key.endswith('REV'):
if key in thisdict_pir:
var = thisdict_pir[key]
var.append(item)
thisdict_pir[key] = var
else:
ll_pir.append(item)
thisdict_pir[key]=ll_pir
listofdocnumberwithnorev=[]
for item in listofextdocno:
if item.endswith('REV'):
listofdocnumberwithnorev.append(item[:-3])
thisdict_pi={}
for extdocno in listofdocnumberwithnorev:
if extdocno in thisdict:
data=thisdict[extdocno]
thisdict_pi[extdocno]=data
listofextdocnoin=thisdict_pir.keys()
finaldict={}
for inv in listofextdocnoin:
listofpir=thisdict_pir[inv]
#print(listofpir)
if inv[:-3] in thisdict_pi:
listofpi=thisdict_pi[inv[:-3]]
#print(listofpi)
else:
listofpi=[]
print(listofpi)
if (len(listofpir)>0):
#print(listofpir)
amtinvr=0
for pinvr in listofpir:
amtinvr=pinvr[5]+amtinvr
#print(amtinvr)
if (len(listofpi)>0):
#print(listofpi)
amtinv=0
for pinv in listofpi:
amtinv=pinv[5]+amtinv
#print(amtinv)
if abs(amtinvr) != abs(amtinv):
val=pinvr
finaldict[inv]=val
elif len(listofpi)<1:
finaldict[inv]=pinvr
You can merge your 2 dataframes then filter out.您可以合并您的 2 个数据框,然后过滤掉。
df3 = (
df1.assign(**{'Document No': df1['Document No'].replace('REV$', '', regex=True)})
.merge(df2, how='left', on='Document No', indicator=True, suffixes=('', '2'))
.query("(_merge == 'left_only') | (Amount != -Amount2)")
.assign(Amount=lambda x: x['Amount2'].fillna(2*x['Amount']).sub(x['Amount']))
[['Document No', 'Amount']]
)
Output: Output:
>>> df3
Document No Amount
2 8138000607 310.0
6 8138000861 15208.0
Update To preserve index from df1
(24, 84), use this modified version:更新要保留df1
(24, 84) 的索引,请使用此修改后的版本:
df3 = (
df1.assign(**{'Document No': df1['Document No'].replace('REV$', '', regex=True)})
.reset_index()
.merge(df2, how='left', on='Document No', indicator=True, suffixes=('', '2'))
.query("(_merge == 'left_only') | (Amount != -Amount2)")
.assign(Amount=lambda x: x['Amount2'].fillna(2*x['Amount']).sub(x['Amount']))
.set_index('index')[['Document No', 'Amount']].rename_axis(None)
)
Output: Output:
>>> df3
Document No Amount
24 8138000607 310.0
84 8138000861 15208.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.