[英]Get Column Name for specific value in each row in Python Pandas
I have the below dataframe called df:我有下面的 dataframe 称为 df:
Id ID | Stage1阶段1 | Stage2第二阶段 | Stage3第三阶段 |
---|---|---|---|
1 1 | 2022-02-01 2022-02-01 | 2020-04-03 2020-04-03 | 2022-06-07 2022-06-07 |
--- --- | ------------ ------------ | ------------ ------------ | ----------- ------------ |
2 2 | 2023-06-07 2023-06-07 | 2020-03-01 2020-03-01 | 2020-09-03 2020-09-03 |
--- --- | ------------ ------------ | ------------ ------------ | ----------- ------------ |
3 3 | 2023-02-04 2023-02-04 | 2023-06-07 2023-06-07 | 2022-06-07 2022-06-07 |
I need to calculate the max date for each ID and its respective Stage.我需要计算每个 ID 及其各自阶段的最大日期。 So for Order 1,2,3 the Stages I need are Stage 3, Stage 1 and Stage 2 respectively.所以对于订单 1、2、3,我需要的阶段分别是阶段 3、阶段 1 和阶段 2。 I started this process by calculating the max date in each row first with the below code:我首先使用以下代码计算每行中的最大日期来开始此过程:
df2 = df[['Stage1', 'Stage2', 'Stage3', 'Stage4', 'Stage5']]
lis = list(df2.max(axis=1))
lis variable has the max dates stored for each row. lis 变量具有为每一行存储的最大日期。 Now, with each max date, I need to get the Stage Name of that row.现在,对于每个最大日期,我需要获取该行的阶段名称。
The below code calculates the max Stage for the whole df and not row.下面的代码计算整个 df 而不是行的最大阶段。
new_lis = []
for i in lis:
new_lis.append(df.columns[df.isin([i]).any()])
How do I fix this?我该如何解决? Output I need is "Stage 3", "Stage 1" and "Stage 2" for Order 1,2,3 respectively. Output 我需要分别是订单 1、2、3 的“第 3 阶段”、“第 1 阶段”和“第 2 阶段”。
Let's try idxmax(axis=1)
让我们试试idxmax(axis=1)
out = (df.filter(like='Stage')
.apply(pd.to_datetime)
.idxmax(axis=1))
print(out)
0 Stage3
1 Stage1
2 Stage2
dtype: object
If your stage columns contain NaT for the whole row, you can drop this row如果您的阶段列包含整行的 NaT,则可以删除此行
out = (df.filter(like='Stage')
.apply(pd.to_datetime)
.dropna(how='all')
.idxmax(axis=1))
input dataframe
Id Stage1 Stage2 Stage3
0 1 2022-02-01 2020-04-03 2022-06-07
1 2 2023-06-07 2020-03-01 2020-09-03
2 3 2023-02-04 2023-06-07 2022-06-07
3 4 NaN NaN NaN
4 5 NaT 2023-06-07 2022-06-07
output dataframe, note the index 3 is dropped
0 Stage3
1 Stage1
2 Stage2
4 Stage2
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.