I have a dataframe that looks like this:
orderID m1 m2 m3
1 2020-03-04 2020-03-04 NaT
2 2020-03-08 NaT NaT
And i want to create a new column that displays the latest milestone (mn) available for an Order.
The output would look like something like this
orderID m1 m2 m3 last_m_available
1 2020-03-04 2020-03-04 NaT m2
2 2020-03-08 NaT NaT m1
How would i do this with python ?
You can swap order of columns, test not missing values and use DataFrame.idxmax
:
#if orderID is not index
df = df.set_index('orderID')
df = df.apply(pd.to_datetime)
df['last_m_available'] = df.iloc[:, ::-1].notna().idxmax(axis=1)
print (df)
m1 m2 m3 last_m_available
orderID
1 2020-03-04 2020-03-04 NaT m2
2 2020-03-08 NaT NaT m1
If possible some rows with only missing values:
df = df.apply(pd.to_datetime)
mask = df.iloc[:, ::-1].notna()
df['last_m_available'] = np.where(mask.any(axis=1), mask.idxmax(axis=1), np.nan)
print (df)
m1 m2 m3 last_m_available
orderID
1 2020-03-04 2020-03-04 NaT m2
2 NaT NaT NaT NaN
You can use dataframe.dropna()
to get rid of empty columns for your list.
cols = df.dropna().columns
df['last_m_available'] = cols
print(df)
m1 m2 m3 last_m_available
orderID
1 2020-03-04 2020-03-04 NaN m2
2 2020-03-08 NaN NaN m1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.