简体   繁体   中英

create new column based on other columns

I have this dataframe

import pandas as pd

df = pd.DataFrame({'Found':['A','A','A','A','A','B','B','B'],
               'Date':['14/10/2021','19/10/2021','29/10/2021','30/09/2021','20/09/2021','20/10/2021','29/10/2021','15/10/2021'],
               'LastDayMonth':['29/10/2021','29/10/2021','29/10/2021','30/09/2021','30/09/2021','29/10/2021','29/10/2021','29/10/2021'],
               'Mark':[1,2,3,4,3,1,2,3]

              })
print(df)

Found        Date LastDayMonth  Mark
0     A  14/10/2021   29/10/2021     1
1     A  19/10/2021   29/10/2021     2
2     A  29/10/2021   29/10/2021     3
3     A  30/09/2021   30/09/2021     4
4     A  20/09/2021   30/09/2021     3
5     B  20/10/2021   29/10/2021     1
6     B  29/10/2021   29/10/2021     2
7     B  15/10/2021   29/10/2021     3

based on this dataframe I need to create a new column that is the "Mark" of the last day of the month to form this new column.

that is, I need the value of the 'Mark' column of the last day of the month of each Found

Result Example

Found        Date LastDayMonth  Mark  Mark_LastDayMonth
0     A  14/10/2021   29/10/2021     1                  3
1     A  19/10/2021   29/10/2021     2                  3
2     A  29/10/2021   29/10/2021     3                  3
3     A  30/09/2021   30/09/2021     4                  4
4     A  20/09/2021   30/09/2021     3                  4
5     B  20/10/2021   29/10/2021     1                  2
6     B  29/10/2021   29/10/2021     2                  2
7     B  15/10/2021   29/10/2021     3                  2

how could I do that? a function? I need to do this for a large amount of data

IIUC, for each Found , you check which is the Mark value at the last day of the month and you assign that value to Mark_LastDayMonth .

You can proceed the following:

# 1. Select last days
mark_last_day = df.loc[df.apply(lambda x: x['Date']==x['LastDayMonth'], 1)]

# 2. Merge them to the original dataset, renaming the right Mark value to Mark_LastDayMonth
df.merge(mark_last_day[['Found', 'LastDayMonth', 'Mark']],
     how='left',
     on=['Found', 'LastDayMonth'],
     suffixes=('', '_LastDayMonth'))

# Output
Found   Date    LastDayMonth    Mark    Mark_LastDayMonth
0   A   14/10/2021  29/10/2021  1       3
1   A   19/10/2021  29/10/2021  2       3
2   A   29/10/2021  29/10/2021  3       3
3   A   30/09/2021  30/09/2021  4       4
4   A   20/09/2021  30/09/2021  3       4 
5   B   20/10/2021  29/10/2021  1       2
6   B   29/10/2021  29/10/2021  2       2
7   B   15/10/2021  29/10/2021  3       2

The first step, that is this line of code df.loc[df.apply(lambda x: x['Date']==x['LastDayMonth'], 1)] will produce the following:

    Found   Date    LastDayMonth    Mark
2   A   29/10/2021  29/10/2021      3
3   A   30/09/2021  30/09/2021      4
6   B   29/10/2021  29/10/2021      2

This output is then merged to the orginal dataframe df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM