简体   繁体   中英

Add conditional column to Pandas Data Frame using else if logic - Python

need some help...

Below is my Data Frame :

+--------------+----------------+---------------+-----------------+------------+
| Planned_Date | Planned_Date_2 | Complete_Date | Complete_Date_2 | Alias_Date |
+--------------+----------------+---------------+-----------------+------------+
| 01/01/1800   |                | 03/09/2020    |                 | 03/09/2020 |
| 01/01/1800   | 20/09/2020     |               |                 | 20/09/2020 |
|              |                |               | 28/09/2020      | 28/09/2020 |
| 04/10/2020   |                |               |                 | 04/10/2020 |
+--------------+----------------+---------------+-----------------+------------+

I'm trying to create a new column ( Alias_Date ) using conditional logic against a few date columns:

The logic is as follows:

if Planned_Date = 01/01/1800
  and Planned_Date_2 = null
    then Complete_Date

else if Planned_Date  = 01/01/1800
  and Planned_Date_2  <> null
    then Planned_Date_2 

else if Planned_Date = null
    then Complete_Date_2

else Planned_Date

How can I efficiently do this using python/pandas/numpy or any other recommended means.

Use forward filling missing values and select last column by position with DataFrame.iloc :

df['Alias_Date'] = df.ffill(axis=1).iloc[:, -1]

If possible some another columns in DataFrame select them by list:

cols = ['Planned_Date', 'Planned_Date_2', 'Complete_Date', 'Complete_Date_2']


df['Alias_Date'] = df[cols].ffill(axis=1).iloc[:, -1]

Or first 4 columns:

df['Alias_Date'] = df.iloc[:, :4].ffill(axis=1).iloc[:, -1]

Or columns with Date :

df['Alias_Date'] = df.filter(like='Date').ffill(axis=1).iloc[:, -1]

EDIT:

Solution with selecting columns in numpy.select :

cols = ['Planned_Date', 'Planned_Date_2', 'Complete_Date', 'Complete_Date_2']

df[cols] = df[cols].apply(pd.to_datetime, dayfirst=True)

m1 = df['Planned_Date'].eq('1800-01-01')
m2 = df['Planned_Date_2'].isna()
m3 = df['Planned_Date'].isna()

df['Alias_Date'] = np.select([m1 & m2, m1 & ~m2, m3], 
                             [df['Complete_Date'], 
                              df['Planned_Date_2'], 
                              df['Complete_Date_2']], default=df['Planned_Date'])
print (df)
  Planned_Date Planned_Date_2 Complete_Date Complete_Date_2 Alias_Date
0   1800-01-01            NaT    2020-09-03             NaT 2020-09-03
1   1800-01-01     2020-09-20           NaT             NaT 2020-09-20
2          NaT            NaT           NaT      2020-09-28 2020-09-28
3   2020-10-04            NaT           NaT             NaT 2020-10-04

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM