熊猫在每行中获得最高的非空值，在具有可变列数的数据框中

Question

I have a dataframe with following sample data, where the number of Columns in Col.x format is unknown:我有一个包含以下示例数据的数据框，其中Col.x格式的列数未知：

Col.1,Col.2,Col.3
Val1, 
Val2,Val3
Val3,
Val4,Val2,Val3

I need to have a separate column with values populated from the highest number of x which is not null .我需要有一个单独的列，其中的值是从非 null 的最大 x 数填充的。 Such as:如：

Col.1,Col.2,Col.3,Latest
Val1,,,Val1
Val2,Val3,,Val3
Val3,,,Val3
Val4,Val2,Val3,Val3

I was able to solve the problem with code below but this solution depends on a) knowing the exact column names and b) doesn't handle the variable number of columns in a scalable way:我能够用下面的代码解决这个问题，但这个解决方案取决于a）知道确切的列名和b）不以可扩展的方式处理可变数量的列：

df["Latest"] = np.where(df["Col.3"].isnull(),np.where(df["Col.2"].isnull(),df["Col.1"],df["Col.2"]),df["Col.3"])

Part a) I can solve... a) 我可以解决...

cols = [col for col in df.columns if 'Col' in col]

... I need help with part b). ...我需要 b) 部分的帮助。

Answer 1

We can use filter to extract certain columns.我们可以使用filter来提取某些列。 like and regex are two powerful options that can be used. like和regex是两个可以使用的强大选项。

Given:鉴于：

    Col1  Col2  Col3  Ignore_me
0   18.0   NaN  40.0       82.0
1    6.0   NaN   NaN       92.0
2  100.0   NaN  19.0       43.0
3   38.0  98.0   NaN        8.0

Doing:正在做：

df['Latest'] = (df[df.filter(like='Col') # Using filter to select certain columns.
                     .columns
                     .sort_values(ascending=False)] # Sort them descending.
                  .bfill(axis=1) # backfill values
                  .iloc[:,0]) # take the first column, 
                              # This has the first non-nan value.

Output, we can see that Ignore_me wasn't used:输出，我们可以看到Ignore_me没有被使用：

    Col1  Col2  Col3  Ignore_me  Latest
0   18.0   NaN  40.0       82.0    40.0
1    6.0   NaN   NaN       92.0     6.0
2  100.0   NaN  19.0       43.0    19.0
3   38.0  98.0   NaN        8.0    98.0

Answer 2

Use fillna with functools.reduce :将fillna与functools.reduce一起使用：

# sort column names by suffix in reverse order
cols = sorted(
   (col for col in df.columns if col.startswith('Col')), 
   key=lambda col: -int(col.split('.')[1])
)
cols
# ['Col.3', 'Col.2', 'Col.1']

from functools import reduce
df['Latest'] = reduce(lambda x, y: x.fillna(y), [df[col] for col in cols])

df
#  Col.1 Col.2 Col.3 Latest
#0  Val1   NaN   NaN   Val1
#1  Val2   NaN  Val3   Val3
#2  Val3   NaN   NaN   Val3
#3  Val4  Val2  Val3   Val3

熊猫在每行中获得最高的非空值，在具有可变列数的数据框中

问题描述

2 个解决方案

解决方案1
1 2022-07-16 18:48:53

解决方案2
0 2022-07-16 18:08:16

熊猫在每行中获得最高的非空值，在具有可变列数的数据框中

问题描述

2 个解决方案

解决方案1 1 2022-07-16 18:48:53

解决方案2 0 2022-07-16 18:08:16

解决方案1
1 2022-07-16 18:48:53

解决方案2
0 2022-07-16 18:08:16