Python - Pandas 2 dataframe calculation based on column value

Question

Input is 2 pandas Dataframe df1 & df2

df1

            Document No            Amount
6         8138000628REV              0.00
9         8138000602REV              0.00
24        8138000607REV            310.00
11        8138000605REV              0.00
14         813800602REV              0.00
45       8138000525AREV              0.00
84        8138000861REV         200000.00
87        8138000748REV         -80770.82

df2

            Document No            Amount
2            8138000628              0.00
5            8138000602              0.00
12           8138000605              0.00
16            813800602              0.00
42          8138000525A              0.00
80           8138000861         215208.00
85           8138000748          80770.82

Required Output is based on "Document No". For each "Document No" in df1 if "Document No" not present in df2 then make it the part of df3. If "Document No" is present in df2 and Amount is different in df1, df2 then make it the part of df3 with "Document No" without "REV" keyword from df2 and amount will be the subtraction

df3

            Document No            Amount
24           8138000607            310.00
84           8138000861          15208.00  -->(215208.00-200000.00)

So far i have tried to achieve my target using dictionary and list using below code snippet and i am able to get the result but I am assuming Pandas does have some great capability to achieve the same with less no of lines of codes. I am not so well versed with Pandas if someone can give me some hint and show me the path to achieve the same using pandas only.

%%time
import pandas as pd
Path_M='somepath'
df_led = pd.read_excel(Path_M + 'ABC Ltd_ recon.xlsx',
                  usecols = ['Document No','Remaining Amount'],
                  sheet_name='Ledger')

df_led['combined']=df_led.values.tolist()
list1 = df_led['combined'].tolist()

thisdict_pir={}
for item in list1:
    ll_pir=[]
    key = item[0]
    key=str(key)
    if key.endswith('REV'):
        if key in thisdict_pir:
            var = thisdict_pir[key]
            var.append(item)        
            thisdict_pir[key] = var            
        else:
            ll_pir.append(item)
            thisdict_pir[key]=ll_pir

listofdocnumberwithnorev=[]
for item in listofextdocno:
    if item.endswith('REV'):
        listofdocnumberwithnorev.append(item[:-3])
thisdict_pi={}
for extdocno in listofdocnumberwithnorev:
    if extdocno in thisdict:
        data=thisdict[extdocno]
        thisdict_pi[extdocno]=data
        
listofextdocnoin=thisdict_pir.keys()

finaldict={}
for inv in listofextdocnoin:
    listofpir=thisdict_pir[inv]
    #print(listofpir)
    if inv[:-3] in thisdict_pi:
        listofpi=thisdict_pi[inv[:-3]] 
        #print(listofpi)
    else:
        listofpi=[]
        print(listofpi)
    if (len(listofpir)>0):
        #print(listofpir)
        amtinvr=0
        for pinvr in listofpir:
            amtinvr=pinvr[5]+amtinvr
        #print(amtinvr)
    if (len(listofpi)>0):
        #print(listofpi)
        amtinv=0
        for pinv in listofpi:
            amtinv=pinv[5]+amtinv
        #print(amtinv)
        
    if abs(amtinvr) != abs(amtinv):
        val=pinvr
        finaldict[inv]=val
    elif len(listofpi)<1:
        finaldict[inv]=pinvr

Answer 1

You can merge your 2 dataframes then filter out.

df3 = (
   df1.assign(**{'Document No': df1['Document No'].replace('REV$', '', regex=True)})
      .merge(df2, how='left', on='Document No', indicator=True, suffixes=('', '2'))
      .query("(_merge == 'left_only') | (Amount != -Amount2)")
      .assign(Amount=lambda x: x['Amount2'].fillna(2*x['Amount']).sub(x['Amount']))
      [['Document No', 'Amount']]
)

Output:

>>> df3
  Document No   Amount
2  8138000607    310.0
6  8138000861  15208.0

Update To preserve index from df1 (24, 84), use this modified version:

df3 = (
   df1.assign(**{'Document No': df1['Document No'].replace('REV$', '', regex=True)})
      .reset_index()
      .merge(df2, how='left', on='Document No', indicator=True, suffixes=('', '2'))
      .query("(_merge == 'left_only') | (Amount != -Amount2)")
      .assign(Amount=lambda x: x['Amount2'].fillna(2*x['Amount']).sub(x['Amount']))
      .set_index('index')[['Document No', 'Amount']].rename_axis(None)
)

Output:

>>> df3
   Document No   Amount
24  8138000607    310.0
84  8138000861  15208.0

Python - Pandas 2 dataframe calculation based on column value

Question

1 answers

solution1
1 2022-01-16 16:11:15

Python - Pandas 2 dataframe calculation based on column value

Question

1 answers

solution1 1 2022-01-16 16:11:15

solution1
1 2022-01-16 16:11:15