考虑优先级，如何使用空值连接熊猫数据框中的列？

Question

我正在使用 pandas 数据框从事大学工作，如下所示：

      import numpy as np
      import pandas as pd

      df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

      print(df)

      col1    sensor_name   col2    col3
      NaN      water         NaN    NaN
      11.2     strain        NaN    2.0
      10.0     fog           30.0   30.0
      NaN      water         40.0   44.0
      1000.0   fog           NaN    NaN

我想加入第 1、2 和 3 列，以避免 NaN 值。 优先级将是 col1 中的值（如果有）。 那么优先级将是“col2”，最后是“col3”。

我尝试执行以下代码：

      df['new_column'] = df.ffill(axis=1)['col3']

输出是：

      col1     sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       water
       11.2     strain          NaN     2.0       2.0
       10.0     fog             30.0    30.0      30.0
       NaN      water           40.0    44.0      44.0
       NaN      fog             NaN     NaN       fog

但是，所需的输出是：

       col1    sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       NaN
       11.2     strain          NaN     2.0       11.2
       10.0     fog             30.0    30.0      10.0
       NaN      water           40.0    44.0      40.0
       1000.0   fog             NaN     NaN       1000.0

Answer 1

您可以实现的众多方法之一是使用pandas.apply函数。

import numpy as np
import pandas as pd

df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

print(df)

def apply_func(row):
  if not pd.isna(row['col1']):
    return row['col1']
  elif not pd.isna(row['col2']):
    return row['col2']
  return row["col3"]

df["new_cols"]=df.apply(apply_func,axis=1)
print(df)

输出：

     col1 sensor_name  col2  col3  new_cols
0     NaN       water   NaN   NaN       NaN
1    11.2      strain   NaN   2.0      11.2
2    10.0         fog  30.0  30.0      10.0
3     NaN       water  40.0  44.0      40.0
4  1000.0         fog   NaN   NaN    1000.0

Answer 2

过滤列

df['new'] = df.filter(like='col').bfill(1)['col1']
df
Out[324]: 
     col1 sensor_name  col2  col3     new
0     NaN       water   NaN   NaN     NaN
1    11.2      strain   NaN   2.0    11.2
2    10.0         fog  30.0  30.0    10.0
3     NaN       water  40.0  44.0    40.0
4  1000.0         fog   NaN   NaN  1000.0

Answer 3

尝试这个：

f = df.filter(regex='col\d')
res = df.assign(new_column=f.where(f.notnull().cumsum(axis=1).eq(1)).max(axis=1))
print(res)
>>>
    col1    sensor_name col2    col3    new_column
0   NaN     water       NaN     NaN     NaN
1   11.2    strain      NaN     2.0     11.2
2   10.0    fog         30.0    30.0    10.0
3   NaN     water       40.0    44.0    40.0
4   1000.0  fog         NaN     NaN     1000.0

Answer 4

df['new_column'] = df['col1']
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,2]
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,3]

如果您有更多列要考虑和/或希望它更加自动化，您可以创建一个 for 循环。

考虑优先级，如何使用空值连接熊猫数据框中的列？

问题描述

4 个解决方案

解决方案1
1 已采纳 2022-07-14 01:40:42

解决方案2
1 2022-07-14 02:01:12

解决方案3
0 2022-07-14 01:52:46

解决方案4
0 2022-07-14 01:58:12

考虑优先级，如何使用空值连接熊猫数据框中的列？

问题描述

4 个解决方案

解决方案1 1 已采纳 2022-07-14 01:40:42

解决方案2 1 2022-07-14 02:01:12

解决方案3 0 2022-07-14 01:52:46

解决方案4 0 2022-07-14 01:58:12

解决方案1
1 已采纳 2022-07-14 01:40:42

解决方案2
1 2022-07-14 02:01:12

解决方案3
0 2022-07-14 01:52:46

解决方案4
0 2022-07-14 01:58:12