简体   繁体   English

返回新列中每一行中第一个匹配值的列名

[英]Return column name of the first matching value in each row in a new column

I have a dataframe where the first column is an ID and each other column is a date.我有一个 dataframe ,其中第一列是 ID,其他列是日期。 Each ID may show the same thing in several columns, may have some leading NaN columns, or may have all NaN columns.每个 ID 可能在几列中显示相同的内容,可能有一些前导 NaN 列,或者可能有所有 NaN 列。 I'd like to create a new column with the name of the column where a specific entry first appears.我想使用首次出现特定条目的列的名称创建一个新列。

sample df:样本df:

| id_report | req id | 1-Jan | 2-Jan | 3-Jan | 4-Jan |
| --------- | -------------- | ----- | ----- | ----- | ----- |
| 0   | 12345 | NaN | Pend | Pend | Appr |
| 1   | 12346  | NaN | NaN | NaN | NaN |
| 2   | 12347 | NaN | NaN | Pend | Pend |
| 3   | 12348  | NaN | NaN | NaN | Appr |

I've searched and come up with:我已经搜索并想出了:

id_report["Pend"] = id_report.apply(lambda x: x == "Pend", axis = 1).idxmax(axis = 1)

But this returns "req id" for every row where "Pend" doesn't appear, and I'd like to keep those positions empty.但这会为没有出现“Pend”的每一行返回“req id”,我想将这些位置保持为空。

Desired output:所需的 output:

id_report id_report req id请求编号 1-Jan 1-1月 2-Jan 1月2日 3-Jan 1月3日 4-Jan 1 月 4 日 Pend挂起
0 0 12345 12345 NaN Pend挂起 Pend挂起 Appr应用程序 2-Jan 1月2日
1 1 12346 12346 NaN NaN NaN NaN NaN
2 2 12347 12347 NaN NaN Pend挂起 Pend挂起 3-Jan 1月3日
3 3 12348 12348 NaN NaN NaN Appr应用程序 NaN

You could chain a replace to your current code:您可以将replace链接到当前代码:

import numpy as np
id_report['Pend'] = (id_report
   .apply(lambda x: x == 'Pend', axis = 1)
   .idxmax(axis = 1)
   .replace('req id', np.nan)
)
req id请求编号 1-Jan 1-1月 2-Jan 1月2日 3-Jan 1月3日 4-Jan 1 月 4 日 Pend挂起
0 0 12345 12345 NaN Pend挂起 Pend挂起 Appr应用程序 2-Jan 1月2日
1 1 12346 12346 NaN NaN NaN NaN NaN
2 2 12347 12347 NaN NaN Pend挂起 Pend挂起 3-Jan 1月3日
3 3 12348 12348 NaN NaN NaN Appr应用程序 NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在新数据帧中返回第一个匹配的值/列名称 - Return first matching value/column name in new dataframe 在随时间变化的概率数据框中返回第一列名称,其中每行的值 &lt;.5 - In the data frame of probabilities over time return first column name where value is < .5 for each row 从行中删除任何 0 值,为行降序排列值,对于行中的每个非 0 值,将索引、列名和分数返回到新的 df - Remove any 0 value from row, order values descending for row, for each non 0 value in row return the index, column name, and score to a new df 循环遍历每一行值并返回列名 - Loop through each row value and return column name 为列中的每个潜在值创建一个新列以创建与行值匹配的真值数组 - Creating a new column for each potential value in a column to create a truth array matching the row value 需要根据第一个非 NaN 值获取列名,并在新列中返回该列名 - Need to fetch column name based on first non NaN value, and return that column name in a new column 有没有办法用列名、第一列中的行值和值本身替换数据框中的每个单元格值? - Is there a way to replace each cell value in a dataframe with the column name, row value in the first column and the value itself? 熊猫:将每一行转换为 <column name,row value> dict并添加为新列 - Pandas: convert each row to a <column name,row value> dict and add as a new column Pandas 根据匹配的行值和列名设置列值 - Pandas set column value based on matching row value and column name 根据匹配的列名和列值提取行值 - Extract row value based on matching column name and column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM