简体   繁体   中英

Modify pandas multiindex value for a particular group

Assume I have a pandas MultiIndex DataFrame with three different levels:

arrays = [['2020-03-30', '2020-03-30', '2020-03-30', '2020-04-15', '2020-04-15', '2020-04-15', '2020-05-10', '2020-05-10', '2020-06-10'], ['Firm1', 'Firm1', 'Firm2', 'Firm1', 'Firm2', 'Firm2', 'Firm1', 'Firm1', 'Firm1'], ['2022-01-01', '2023-02-01', '2021-01-05', '2021-04-01', '2022-01-01', '2024-02-01', '2021-02-05', '2022-04-01', '2022-04-01']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Buy_date', 'Firm', 'Sell_date'))
df = pd.DataFrame(np.zeros(9), index = idx)
df

                               0
Buy_date   Firm  Sell_date      
2020-03-30 Firm1 2022-01-01  0.0
                 2023-02-01  0.0
           Firm2 2021-01-05  0.0
2020-04-15 Firm1 2021-04-01  0.0
           Firm2 2022-01-01  0.0
                 2024-02-01  0.0
2020-05-10 Firm1 2021-02-05  0.0
                 2022-04-01  0.0
2020-06-10 Firm1 2022-04-01  0.0

For a given buy date, I want to change the Maximum Sell_date index value of Firm1 and set it to 01.01.2030. So the output should be:

Buy_date   Firm  Sell_date      
2020-03-30 Firm1 2022-01-01  0.0
                 2030-01-01  0.0
           Firm2 2021-01-05  0.0
2020-04-15 Firm1 2030-01-01  0.0
           Firm2 2022-01-01  0.0
                 2024-02-01  0.0
2020-05-10 Firm1 2021-02-05  0.0
                 2030-01-01  0.0
2020-06-10 Firm1 2030-01-01  0.0

Does anybody have an idea how to do this?

Here's one way using set_index and append a new index level with the required modifications. Also used groupby.transform to get the max dates:

max_dates = df.reset_index(level='Sell_date').groupby(level=[0,1])['Sell_date'].transform('max')
df = df.set_index(df.index.get_level_values(2)
                  .where((df.index.get_level_values(1)!='Firm1') | 
                         (df.index.get_level_values(2)!=max_dates), '2030-01-01'), 
                  append=True).droplevel(2)

A more straightforward way is to use to reset_index , modify the DataFrame and set the index with the new columns:

df1 = df.reset_index()
max_date = df1.groupby(['Buy_date','Firm'])['Sell_date'].transform('max')
df1.loc[df1['Sell_date'].eq(max_date) & df1['Firm'].eq('Firm1'), 'Sell_date'] = '2030-01-01'
df = df1.set_index(['Buy_date','Firm','Sell_date'])

Output:

                               0
Buy_date   Firm  Sell_date      
2020-03-30 Firm1 2022-01-01  0.0
                 2030-01-01  0.0
           Firm2 2021-01-05  0.0
2020-04-15 Firm1 2030-01-01  0.0
           Firm2 2022-01-01  0.0
                 2024-02-01  0.0
2020-05-10 Firm1 2021-02-05  0.0
                 2030-01-01  0.0
2020-06-10 Firm1 2030-01-01  0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM