繁体   English   中英

考虑优先级,如何使用空值连接熊猫数据框中的列?

[英]How to join columns in a pandas dataframe, with empty values, considering priority?

我正在使用 pandas 数据框从事大学工作,如下所示:

      import numpy as np
      import pandas as pd

      df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

      print(df)

      col1    sensor_name   col2    col3
      NaN      water         NaN    NaN
      11.2     strain        NaN    2.0
      10.0     fog           30.0   30.0
      NaN      water         40.0   44.0
      1000.0   fog           NaN    NaN

我想加入第 1、2 和 3 列,以避免 NaN 值。 优先级将是 col1 中的值(如果有)。 那么优先级将是“col2”,最后是“col3”。

我尝试执行以下代码:

      df['new_column'] = df.ffill(axis=1)['col3']

输出是:

      col1     sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       water
       11.2     strain          NaN     2.0       2.0
       10.0     fog             30.0    30.0      30.0
       NaN      water           40.0    44.0      44.0
       NaN      fog             NaN     NaN       fog

但是,所需的输出是:

       col1    sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       NaN
       11.2     strain          NaN     2.0       11.2
       10.0     fog             30.0    30.0      10.0
       NaN      water           40.0    44.0      40.0
       1000.0   fog             NaN     NaN       1000.0

您可以实现的众多方法之一是使用pandas.apply函数。

import numpy as np
import pandas as pd

df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

print(df)

def apply_func(row):
  if not pd.isna(row['col1']):
    return row['col1']
  elif not pd.isna(row['col2']):
    return row['col2']
  return row["col3"]

df["new_cols"]=df.apply(apply_func,axis=1)
print(df)

输出:

     col1 sensor_name  col2  col3  new_cols
0     NaN       water   NaN   NaN       NaN
1    11.2      strain   NaN   2.0      11.2
2    10.0         fog  30.0  30.0      10.0
3     NaN       water  40.0  44.0      40.0
4  1000.0         fog   NaN   NaN    1000.0

过滤列

df['new'] = df.filter(like='col').bfill(1)['col1']
df
Out[324]: 
     col1 sensor_name  col2  col3     new
0     NaN       water   NaN   NaN     NaN
1    11.2      strain   NaN   2.0    11.2
2    10.0         fog  30.0  30.0    10.0
3     NaN       water  40.0  44.0    40.0
4  1000.0         fog   NaN   NaN  1000.0

尝试这个:

f = df.filter(regex='col\d')
res = df.assign(new_column=f.where(f.notnull().cumsum(axis=1).eq(1)).max(axis=1))
print(res)
>>>
    col1    sensor_name col2    col3    new_column
0   NaN     water       NaN     NaN     NaN
1   11.2    strain      NaN     2.0     11.2
2   10.0    fog         30.0    30.0    10.0
3   NaN     water       40.0    44.0    40.0
4   1000.0  fog         NaN     NaN     1000.0
df['new_column'] = df['col1']
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,2]
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,3]

如果您有更多列要考虑和/或希望它更加自动化,您可以创建一个 for 循环。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM