How to input data in dataframe basing on multiple conditions

Question

ID	Create Date	Last Modify Date
1	03/31/2021 8:56	03/31/2021 09:46
1	03/31/2021 5:56	03/31/2021 09:48
2	03/31/2021 0:23	03/31/2021 09:47
2	03/31/2021 6:56	03/31/2021 09:46
3	03/31/2021 7:32	03/31/2021 09:46
3	03/31/2021 8:45	03/31/2021 09:46

Hello,

For the above table I need to comment oldest Create Date for each ID as "Minimal" .

import pandas as pd

inputFolder = os.getcwd()
filename = filedialog.askopenfilename(title="Select file:", filetypes=(("xlsx files", ".xlsx"), ("all files", "*.*")), initialdir = inputFolder)
df = pd.read_excel(filename, index_col=None, header=0) 

df.loc[(df.groupby(['BB Global ID']).agg({'Create Date': min})), 'Comment'] = 'Minimal'

print(df)

I tried to do it with pandas df.loc function but I'm having below error.

KeyError: "None of [Index([('C', 'r', 'e', 'a', 't', 'e', ' ', 'D', 'a', 't', 'e')], dtype='object')] are in the [index]"

Below is final result what I want to achieve:

ID	Create Date	Last Modify Date	Comment
1	03/31/2021 8:56	03/31/2021 09:46
1	03/31/2021 5:56	03/31/2021 09:48	Minimal
2	03/31/2021 0:23	03/31/2021 09:47	Minimal
2	03/31/2021 6:56	03/31/2021 09:46
3	03/31/2021 7:32	03/31/2021 09:46	Minimal
3	03/31/2021 8:45	03/31/2021 09:46

Answer 1

Use GroupBy.transform for repeat aggregate values, so possible compare by original column:

mask = df.groupby(['BB Global ID'])['Create Date'].transform(min).eq(df['Create Date'])
df.loc[mask, 'Comment'] = 'Minimal'

Or:

df['Comment'] = np.where(mask, 'Minimal', '')

How to input data in dataframe basing on multiple conditions

Question

1 answers

solution1
3 ACCPTED 2021-04-16 07:09:34

How to input data in dataframe basing on multiple conditions

Question

1 answers

solution1 3 ACCPTED 2021-04-16 07:09:34

solution1
3 ACCPTED 2021-04-16 07:09:34