简体   繁体   English

熊猫 ffill 和 bfill 不能一起工作

[英]Pandas ffill and bfill don't work together

I am having trouble making bfill and ffill work within the same dataset.我无法在同一数据集中使 bfill 和 ffill 工作。

I have a merged dataset similar to the one below.我有一个类似于下面的合并数据集。 All rows have a Project Code and Date, but rows where spending was recorded before/after the Start/End Date of a subscription do not have a subscription code.所有行都有项目代码和日期,但在订阅的开始/结束日期之前/之后记录支出的行没有订阅代码。

Project Code     Start Date     End Date     Subscription Code     Date     Recorded Spending 
   349                                                            8/1/19          50
   349             9/1/18        9/1/19          349A             3/1/19          88
   349             9/1/18        9/1/19          349A             8/1/19          
   349             9/1/19        9/1/20          349B             10/1/19         120
   349                                                            10/1/20         22

I would like to extend the Subscription Code values so that all spending before the official start of the project is counted under the first subscription code and any spending after the official completion of the project is counted under the last subscription code.我想扩展订阅代码值,以便项目正式开始之前的所有支出都计入第一个订阅代码,而项目正式完成后的任何支出都计入最后一个订阅代码。

In my solution I have found that I can EITHER ffill or bfill - whichever is first.在我的解决方案中,我发现我可以 ffill 或 bfill - 以先到者为准。 So the code below results in forward-filled Subscription Codes but codes are never back filled.所以下面的代码导致前向填充的订阅代码,但代码永远不会被回填。

    df.sort_values(by=['Project Code','Date'], inplace=True)
    #backfill subscription code
    df.loc[:,['Subscription Code']] = df.loc[:,['Subscription Code']].ffill()
    
    #remove if the project code does not match subscription code
    df['Subscription Code'] = np.where(df['Subscription Code'].str[:3] != df['Project Code'], '', df['Subscription Code'])

    df.loc[:,['Subscription Code']] = df.loc[:,['Subscription Code']].bfill()

    #remove if the project code does not match subscription code
    df['Subscription Code'] = np.where(df['Subscription Code'].str[:3] != df['Project Code'], '', df['Subscription Code'])

How to I combine these so that I can BOTH ffill and bfill?如何组合这些以便我可以同时填充和填充?

In this case I knew that the first subscription code would be the Project Code + 'A', so I was able to use this code to get what I needed:在这种情况下,我知道第一个订阅代码将是项目代码 + 'A',因此我能够使用此代码来获取我需要的内容:

def fill_empty_subscription_code(df):
    
    df.sort_values(by=['Project Code', 'Date'], inplace=True) 
    #If there is spending recorded after the last subscription end date then include it with the last subscription
    df.loc[:, 'Subscription Code'] = df.groupby(['Project Code'])['Subscription Code'].ffill()
    #If there is work done before the first subscription start date include it in the first subscription line code (denoted with the concatenated  0)
    df.loc[:, 'Subscription Code'] = np.where(pd.isna(df['Subscription Code']), 
      df['Project Code'] + 'A',
      df['Subscription Code'])
            
    return df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM