简体   繁体   中英

How to input data in dataframe basing on multiple conditions

ID Create Date Last Modify Date
1 03/31/2021 8:56 03/31/2021 09:46
1 03/31/2021 5:56 03/31/2021 09:48
2 03/31/2021 0:23 03/31/2021 09:47
2 03/31/2021 6:56 03/31/2021 09:46
3 03/31/2021 7:32 03/31/2021 09:46
3 03/31/2021 8:45 03/31/2021 09:46

Hello,

For the above table I need to comment oldest Create Date for each ID as "Minimal" .

import pandas as pd

inputFolder = os.getcwd()
filename = filedialog.askopenfilename(title="Select file:", filetypes=(("xlsx files", ".xlsx"), ("all files", "*.*")), initialdir = inputFolder)
df = pd.read_excel(filename, index_col=None, header=0) 

df.loc[(df.groupby(['BB Global ID']).agg({'Create Date': min})), 'Comment'] = 'Minimal'

print(df)

I tried to do it with pandas df.loc function but I'm having below error.

KeyError: "None of [Index([('C', 'r', 'e', 'a', 't', 'e', ' ', 'D', 'a', 't', 'e')], dtype='object')] are in the [index]"

Below is final result what I want to achieve:

ID Create Date Last Modify Date Comment
1 03/31/2021 8:56 03/31/2021 09:46
1 03/31/2021 5:56 03/31/2021 09:48 Minimal
2 03/31/2021 0:23 03/31/2021 09:47 Minimal
2 03/31/2021 6:56 03/31/2021 09:46
3 03/31/2021 7:32 03/31/2021 09:46 Minimal
3 03/31/2021 8:45 03/31/2021 09:46

Use GroupBy.transform for repeat aggregate values, so possible compare by original column:

mask = df.groupby(['BB Global ID'])['Create Date'].transform(min).eq(df['Create Date'])
df.loc[mask, 'Comment'] = 'Minimal'

Or:

df['Comment'] = np.where(mask, 'Minimal', '')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM