简体   繁体   English

如何根据条件添加行与另一个 dataframe

[英]How to add rows based on a condition with another dataframe

I have two dataframes as follows:我有两个数据框如下:

agreement协议

  agreement_id activation  term_months  total_fee
0            A 2020-12-01           24       4800
1            B 2021-01-02            6        300
2            C 2021-01-21            6        600
3            D 2021-03-04            6        300

payments付款

    cust_id agreement_id       date  payment
0         1            A 2020-12-01      200
1         1            A 2021-02-02      200
2         1            A 2021-02-03      100
3         1            A 2021-05-01      200
4         1            B 2021-01-02       50
5         1            B 2021-01-09       20
6         1            B 2021-03-01       80
7         1            B 2021-04-23       90
8         2            C 2021-01-21      600
9         3            D 2021-03-04      150
10        3            D 2021-05-03      150

I want to add another row in the payments dataframe when the total payments for the agreement_id in the payments dataframe is equal to the total_fee in the agreement_id.当付款 dataframe 中的协议 ID 的总付款等于协议 ID 中的总费用时,我想在付款 dataframe 中添加另一行。 The row would contain a zero value under the payments and the date will be calculated as min(date) (from payments) plus term_months (from agreement).该行将在付款下包含零值,并且日期将计算为 min(date)(来自付款)加上 term_months(来自协议)。

Here's the results I want for the payments dataframe:这是我想要的付款 dataframe 的结果:

payments付款

    cust_id agreement_id       date  payment
0         1            A 2020-12-01      200
1         1            A 2021-02-02      200
2         1            A 2021-02-03      100
3         1            A 2021-05-01      200
4         1            B 2021-01-02       50
5         1            B 2021-01-09       20
6         1            B 2021-03-01       80
7         1            B 2021-04-23       90
8         2            C 2021-01-21      600
9         3            D 2021-03-04      150
10        3            D 2021-05-03      150
11        2            C 2021-07-21      0
12        3            D 2021-09-04      0
    

The additional rows are row 11 and 12. The agreement_id 'C' and 'D' where equal to the total_fee shown in the agreement dataframe.额外的行是第 11 行和第 12 行。agreement_id 'C' 和 'D' 等于协议 dataframe 中显示的 total_fee。

import pandas as pd
import numpy as np

Firstly convert 'date' column of payment dataframe into datetime dtype by using to_datetime() method:首先使用to_datetime()方法将付款 dataframe 的“日期”列转换为 datetime dtype:

payments['date']=pd.to_datetime(payments['date'])

You can do this by using groupby() method:您可以使用groupby()方法来做到这一点:

newdf=payments.groupby('agreement_id').agg({'payment':'sum','date':'min','cust_id':'first'}).reset_index()

Now by boolean masking get the data which mets your condition:现在通过 boolean 掩码获取满足您条件的数据:

newdf=newdf[agreement['total_fee']==newdf['payment']].assign(payment=np.nan)

Note: here in the above code we are using assign() method and making the payments row to NaN注意:在上面的代码中,我们使用了assign()方法并将支付行设置为NaN

Now make use of pd.tseries.offsets.Dateoffsets() method and apply() method:现在使用pd.tseries.offsets.Dateoffsets()方法和apply()方法:

newdf['date']=newdf['date']+agreement['term_months'].apply(lambda x:pd.tseries.offsets.DateOffset(months=x))

Note: The above code gives you a warning so just ignore that warning as it's a warning not an error注意:上面的代码给你一个警告,所以忽略那个警告,因为它是警告而不是错误

Finally make use of concat() method and fillna() method:最后使用concat()方法和fillna()方法:

result=pd.concat((payments,newdf),ignore_index=True).fillna(0)

Now if you print result you will get your desired output现在,如果您打印result ,您将获得所需的 output

#output

   cust_id  agreement_id    date    payment
0   1           A       2020-12-01  200.0
1   1           A       2021-02-02  200.0
2   1           A       2021-02-03  100.0
3   1           A       2021-05-01  200.0
4   1           B       2021-01-02  50.0
5   1           B       2021-01-09  20.0
6   1           B       2021-03-01  80.0
7   1           B       2021-04-23  90.0
8   2           C       2021-01-21  600.0
9   3           D       2021-03-04  150.0
10  3           D       2021-05-03  150.0
11  2           C       2021-07-21  0.0
12  3           D       2021-09-04  0.0

Note: If you want exact same output then make use of astype() method and change payment column dtype from float to int注意:如果您想要完全相同的 output 然后使用astype()方法并将支付列 dtype 从float更改为int

result['payment']=result['payment'].astype(int)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM