简体   繁体   English

获取 Python Pandas 中每一行中特定值的列名

[英]Get Column Name for specific value in each row in Python Pandas

I have the below dataframe called df:我有下面的 dataframe 称为 df:

Id ID Stage1阶段1 Stage2第二阶段 Stage3第三阶段
1 1 2022-02-01 2022-02-01 2020-04-03 2020-04-03 2022-06-07 2022-06-07
--- --- ------------ ------------ ------------ ------------ ----------- ------------
2 2 2023-06-07 2023-06-07 2020-03-01 2020-03-01 2020-09-03 2020-09-03
--- --- ------------ ------------ ------------ ------------ ----------- ------------
3 3 2023-02-04 2023-02-04 2023-06-07 2023-06-07 2022-06-07 2022-06-07

I need to calculate the max date for each ID and its respective Stage.我需要计算每个 ID 及其各自阶段的最大日期。 So for Order 1,2,3 the Stages I need are Stage 3, Stage 1 and Stage 2 respectively.所以对于订单 1、2、3,我需要的阶段分别是阶段 3、阶段 1 和阶段 2。 I started this process by calculating the max date in each row first with the below code:我首先使用以下代码计算每行中的最大日期来开始此过程:

df2 = df[['Stage1', 'Stage2', 'Stage3', 'Stage4', 'Stage5']]
lis = list(df2.max(axis=1))

lis variable has the max dates stored for each row. lis 变量具有为每一行存储的最大日期。 Now, with each max date, I need to get the Stage Name of that row.现在,对于每个最大日期,我需要获取该行的阶段名称。

The below code calculates the max Stage for the whole df and not row.下面的代码计算整个 df 而不是行的最大阶段。

new_lis = []
for i in lis:
    new_lis.append(df.columns[df.isin([i]).any()])

How do I fix this?我该如何解决? Output I need is "Stage 3", "Stage 1" and "Stage 2" for Order 1,2,3 respectively. Output 我需要分别是订单 1、2、3 的“第 3 阶段”、“第 1 阶段”和“第 2 阶段”。

Let's try idxmax(axis=1)让我们试试idxmax(axis=1)

out = (df.filter(like='Stage')
       .apply(pd.to_datetime)
       .idxmax(axis=1))
print(out)

0    Stage3
1    Stage1
2    Stage2
dtype: object

If your stage columns contain NaT for the whole row, you can drop this row如果您的阶段列包含整行的 NaT,则可以删除此行

out = (df.filter(like='Stage')
       .apply(pd.to_datetime)
       .dropna(how='all')
       .idxmax(axis=1))
input dataframe

   Id      Stage1      Stage2      Stage3
0   1  2022-02-01  2020-04-03  2022-06-07
1   2  2023-06-07  2020-03-01  2020-09-03
2   3  2023-02-04  2023-06-07  2022-06-07
3   4         NaN         NaN         NaN
4   5         NaT  2023-06-07  2022-06-07

output dataframe, note the index 3 is dropped

0    Stage3
1    Stage1
2    Stage2
4    Stage2
dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM