[英]Pandas get highest non-null value in each row, in dataframe with variable number of columns
I have a dataframe with following sample data, where the number of Columns in Col.x
format is unknown:我有一个包含以下示例数据的数据框,其中
Col.x
格式的列数未知:
Col.1,Col.2,Col.3
Val1,
Val2,Val3
Val3,
Val4,Val2,Val3
I need to have a separate column with values populated from the highest number of x which is not null .我需要有一个单独的列,其中的值是从非 null 的最大 x 数填充的。 Such as:
如:
Col.1,Col.2,Col.3,Latest
Val1,,,Val1
Val2,Val3,,Val3
Val3,,,Val3
Val4,Val2,Val3,Val3
I was able to solve the problem with code below but this solution depends on a) knowing the exact column names and b) doesn't handle the variable number of columns in a scalable way:我能够用下面的代码解决这个问题,但这个解决方案取决于a)知道确切的列名和b)不以可扩展的方式处理可变数量的列:
df["Latest"] = np.where(df["Col.3"].isnull(),np.where(df["Col.2"].isnull(),df["Col.1"],df["Col.2"]),df["Col.3"])
Part a) I can solve... a) 我可以解决...
cols = [col for col in df.columns if 'Col' in col]
... I need help with part b). ...我需要 b) 部分的帮助。
We can use filter
to extract certain columns.我们可以使用
filter
来提取某些列。 like
and regex
are two powerful options that can be used. like
和regex
是两个可以使用的强大选项。
Given:鉴于:
Col1 Col2 Col3 Ignore_me
0 18.0 NaN 40.0 82.0
1 6.0 NaN NaN 92.0
2 100.0 NaN 19.0 43.0
3 38.0 98.0 NaN 8.0
Doing:正在做:
df['Latest'] = (df[df.filter(like='Col') # Using filter to select certain columns.
.columns
.sort_values(ascending=False)] # Sort them descending.
.bfill(axis=1) # backfill values
.iloc[:,0]) # take the first column,
# This has the first non-nan value.
Output, we can see that Ignore_me
wasn't used:输出,我们可以看到
Ignore_me
没有被使用:
Col1 Col2 Col3 Ignore_me Latest
0 18.0 NaN 40.0 82.0 40.0
1 6.0 NaN NaN 92.0 6.0
2 100.0 NaN 19.0 43.0 19.0
3 38.0 98.0 NaN 8.0 98.0
Use fillna
with functools.reduce
:将
fillna
与functools.reduce
一起使用:
# sort column names by suffix in reverse order
cols = sorted(
(col for col in df.columns if col.startswith('Col')),
key=lambda col: -int(col.split('.')[1])
)
cols
# ['Col.3', 'Col.2', 'Col.1']
from functools import reduce
df['Latest'] = reduce(lambda x, y: x.fillna(y), [df[col] for col in cols])
df
# Col.1 Col.2 Col.3 Latest
#0 Val1 NaN NaN Val1
#1 Val2 NaN Val3 Val3
#2 Val3 NaN NaN Val3
#3 Val4 Val2 Val3 Val3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.