Python - Pandas 2 dataframe 基于列值计算

Question

Input is 2 pandas Dataframe df1 & df2输入为 2 pandas Dataframe df1 & df2

df1 df1

            Document No            Amount
6         8138000628REV              0.00
9         8138000602REV              0.00
24        8138000607REV            310.00
11        8138000605REV              0.00
14         813800602REV              0.00
45       8138000525AREV              0.00
84        8138000861REV         200000.00
87        8138000748REV         -80770.82

df2 df2

            Document No            Amount
2            8138000628              0.00
5            8138000602              0.00
12           8138000605              0.00
16            813800602              0.00
42          8138000525A              0.00
80           8138000861         215208.00
85           8138000748          80770.82

Required Output is based on "Document No".所需的 Output 基于“文档编号”。 For each "Document No" in df1 if "Document No" not present in df2 then make it the part of df3.对于 df1 中的每个“文档编号”，如果 df2 中不存在“文档编号”，则使其成为 df3 的一部分。 If "Document No" is present in df2 and Amount is different in df1, df2 then make it the part of df3 with "Document No" without "REV" keyword from df2 and amount will be the subtraction如果 df2 中存在“文档编号”并且 df1、df2 中的金额不同，则使用 df2 中没有“REV”关键字的“文档编号”使其成为 df3 的一部分，金额将是减法

df3 df3

            Document No            Amount
24           8138000607            310.00
84           8138000861          15208.00  -->(215208.00-200000.00)

So far i have tried to achieve my target using dictionary and list using below code snippet and i am able to get the result but I am assuming Pandas does have some great capability to achieve the same with less no of lines of codes.到目前为止，我已经尝试使用字典和列表使用下面的代码片段来实现我的目标，并且我能够得到结果，但我假设 Pandas 确实有一些强大的能力来实现相同的目标，而代码行数更少。 I am not so well versed with Pandas if someone can give me some hint and show me the path to achieve the same using pandas only.如果有人可以给我一些提示并向我展示仅使用 pandas 实现相同目标的路径，我对 Pandas 不太熟悉。

%%time
import pandas as pd
Path_M='somepath'
df_led = pd.read_excel(Path_M + 'ABC Ltd_ recon.xlsx',
                  usecols = ['Document No','Remaining Amount'],
                  sheet_name='Ledger')

df_led['combined']=df_led.values.tolist()
list1 = df_led['combined'].tolist()

thisdict_pir={}
for item in list1:
    ll_pir=[]
    key = item[0]
    key=str(key)
    if key.endswith('REV'):
        if key in thisdict_pir:
            var = thisdict_pir[key]
            var.append(item)        
            thisdict_pir[key] = var            
        else:
            ll_pir.append(item)
            thisdict_pir[key]=ll_pir

listofdocnumberwithnorev=[]
for item in listofextdocno:
    if item.endswith('REV'):
        listofdocnumberwithnorev.append(item[:-3])
thisdict_pi={}
for extdocno in listofdocnumberwithnorev:
    if extdocno in thisdict:
        data=thisdict[extdocno]
        thisdict_pi[extdocno]=data
        
listofextdocnoin=thisdict_pir.keys()

finaldict={}
for inv in listofextdocnoin:
    listofpir=thisdict_pir[inv]
    #print(listofpir)
    if inv[:-3] in thisdict_pi:
        listofpi=thisdict_pi[inv[:-3]] 
        #print(listofpi)
    else:
        listofpi=[]
        print(listofpi)
    if (len(listofpir)>0):
        #print(listofpir)
        amtinvr=0
        for pinvr in listofpir:
            amtinvr=pinvr[5]+amtinvr
        #print(amtinvr)
    if (len(listofpi)>0):
        #print(listofpi)
        amtinv=0
        for pinv in listofpi:
            amtinv=pinv[5]+amtinv
        #print(amtinv)
        
    if abs(amtinvr) != abs(amtinv):
        val=pinvr
        finaldict[inv]=val
    elif len(listofpi)<1:
        finaldict[inv]=pinvr

Answer 1

You can merge your 2 dataframes then filter out.您可以合并您的 2 个数据框，然后过滤掉。

df3 = (
   df1.assign(**{'Document No': df1['Document No'].replace('REV$', '', regex=True)})
      .merge(df2, how='left', on='Document No', indicator=True, suffixes=('', '2'))
      .query("(_merge == 'left_only') | (Amount != -Amount2)")
      .assign(Amount=lambda x: x['Amount2'].fillna(2*x['Amount']).sub(x['Amount']))
      [['Document No', 'Amount']]
)

Output: Output：

>>> df3
  Document No   Amount
2  8138000607    310.0
6  8138000861  15208.0

Update To preserve index from df1 (24, 84), use this modified version:更新要保留df1 (24, 84) 的索引，请使用此修改后的版本：

df3 = (
   df1.assign(**{'Document No': df1['Document No'].replace('REV$', '', regex=True)})
      .reset_index()
      .merge(df2, how='left', on='Document No', indicator=True, suffixes=('', '2'))
      .query("(_merge == 'left_only') | (Amount != -Amount2)")
      .assign(Amount=lambda x: x['Amount2'].fillna(2*x['Amount']).sub(x['Amount']))
      .set_index('index')[['Document No', 'Amount']].rename_axis(None)
)

Output: Output：

>>> df3
   Document No   Amount
24  8138000607    310.0
84  8138000861  15208.0

Python - Pandas 2 dataframe 基于列值计算

问题描述

1 个解决方案

解决方案1
1 2022-01-16 16:11:15

Python - Pandas 2 dataframe 基于列值计算

问题描述

1 个解决方案

解决方案1 1 2022-01-16 16:11:15

解决方案1
1 2022-01-16 16:11:15