简体   繁体   中英

How to append data to dataframe based on another column?

I would like to append values from one dataframe into another by checking if they contain the same Document Number.

ip_df:

        CardName     DocNum    DocDate      DocTotal    DocNum2    PaidToDate   Balance
0       CompanyA  800100001 2021-03-01  10000.000000  920000000  10000.000000  0.000000
1       CompanyA  800100002 2021-03-01  20000.000000  920000000  20000.000000  0.000000
2       CompanyA  800100003 2021-03-01  30000.000000  920000000  30000.000000  0.000000
3       CompanyA  800100004 2021-03-01  40000.000000  920000000  40000.000000  0.000000
4       CompanyA  800100005 2021-03-01  50000.000000  920000000  50000.000000  0.000000
..      ...        ...        ...           ...        ...           ...       ...
94      CompanyY  800100006 2021-03-01  60000.000000  920000005  60000.000000  0.000000
95      CompanyY  800100007 2021-03-01  70000.000000  920000005  70000.000000  0.000000
96      CompanyY  800100008 2021-03-01  80000.000000  920000005  80000.000000  0.000000
97      CompanyZ  800100009 2021-03-01  90000.000000  920000006  90000.000000  0.000000
98      CompanyZ  800100010 2021-03-01  11000.000000  920000006  11000.000000  0.000000

[99 rows x 7 columns]

ar_df:

        BPCode       Balance Currency    DueDate  BPName  TransId       Ref1      Payment Received  Bank Charge
0     XXXXXXXX  10000.000000      USD 2020-09-29  CompanyA   503378  800100001          0            0
1     XXXXXXXX  20000.000000      USD 2021-03-01  CompanyA   543103  800100002          0            0
2     XXXXXXXX  30000.000000      USD 2021-03-01  CompanyA   543171  800100003          0            0
3     XXXXXXXX  40000.000000      USD 2021-03-01  CompanyA   544205  800100004          0            0
4     XXXXXXXX  50000.000000      USD 2021-03-01  CompanyA   544222  800100005          0            0
...        ...           ...      ...        ...  ...      ...        ...               ...          ...
3763  XXXXXXXX  60000.000000      USD 2021-03-02  CompanyY   548612  800100006          0            0
3764  XXXXXXXX  70000.000000      USD 2021-03-02  CompanyY   547727  800100007          0            0
3765  XXXXXXXX  80000.000000      USD 2021-03-30  CompanyY   553819  800100008          0            0
3766  XXXXXXXX  90000.000000      USD 2021-04-01  CompanyZ   547707  800100009          0            0
3767  XXXXXXXX  11000.000000      USD 2021-04-29  CompanyZ   556102  800100010          0            0

[3768 rows x 9 columns]

I am trying to do this:

for row in ip_df:
if ip_df.row['DocNum'] == ar_df.row['Ref1']:
    ap_df.row['Payment Received'] = ip_df.row['PaidToDate']

But I am not too sure of the correct way to doing this.

I have tried searching around, but mostly, the answers always seem to point towards checking against scalar values only.

What I want to achieve is checking if in a row in ip_df, whether 'DocNum' can be found in the ap_df's 'Ref1' column, and if true, set ap_df's 'Payment Received' row to be a value from ip_df's 'PaidToDate' row.

The end result should look like:

        BPCode       Balance Currency    DueDate  BPName  TransId       Ref1      Payment Received  Bank Charge
0     XXXXXXXX  10000.000000      USD 2020-09-29  CompanyA   503378  800100001          10000.000000 0
1     XXXXXXXX  20000.000000      USD 2021-03-01  CompanyA   543103  800100002          20000.000000 0
2     XXXXXXXX  30000.000000      USD 2021-03-01  CompanyA   543171  800100003          30000.000000 0
3     XXXXXXXX  40000.000000      USD 2021-03-01  CompanyA   544205  800100004          40000.000000 0
4     XXXXXXXX  50000.000000      USD 2021-03-01  CompanyA   544222  800100005          50000.000000 0
...        ...           ...      ...        ...  ...      ...        ...               ...          ...
3763  XXXXXXXX  60000.000000      USD 2021-03-02  CompanyY   548612  800114258          60000.000000 0
3764  XXXXXXXX  70000.000000      USD 2021-03-02  CompanyY   547727  800113975          70000.000000 0
3765  XXXXXXXX  80000.000000      USD 2021-03-30  CompanyY   553819  800115292          80000.000000 0
3766  XXXXXXXX  90000.000000      USD 2021-04-01  CompanyZ   547707  800113957          90000.000000 0
3767  XXXXXXXX  11000.000000      USD 2021-04-29  CompanyZ   556102  800115741          11000.000000 0

[3768 rows x 9 columns]

Thank you for any help in advance!!

Use df.merge on a subset of columns with df.rename :

In [220]: ar_df['Ref1'] = ar_df['Ref1'].str.replace('', np.nan).astype(int)
In [221]: ip_df['DocNum'] = ar_df['DocNum'].str.replace('', np.nan).astype(int)

In [222]: ar_df.merge(ip_df[['DocNum', 'PaidToDate']], left_on='Ref1', right_on='DocNum').drop(['Payment Received', 'DocNum'], 1).rename(columns={'PaidToDate':'Payment Received'})
Out[222]: 
     BPCode  Balance Currency     DueDate    BPName  TransId       Ref1  Bank_Charge  Payment_Received
0  XXXXXXXX  10000.0      USD  2020-09-29  CompanyA   503378  800100001            0           10000.0
1  XXXXXXXX  20000.0      USD  2021-03-01  CompanyA   543103  800100002            0           20000.0
2  XXXXXXXX  30000.0      USD  2021-03-01  CompanyA   543171  800100003            0           30000.0
3  XXXXXXXX  40000.0      USD  2021-03-01  CompanyA   544205  800100004            0           40000.0
4  XXXXXXXX  50000.0      USD  2021-03-01  CompanyA   544222  800100005            0           50000.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM