簡體   English   中英

考慮優先級,如何使用空值連接熊貓數據框中的列?

[英]How to join columns in a pandas dataframe, with empty values, considering priority?

我正在使用 pandas 數據框從事大學工作,如下所示:

      import numpy as np
      import pandas as pd

      df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

      print(df)

      col1    sensor_name   col2    col3
      NaN      water         NaN    NaN
      11.2     strain        NaN    2.0
      10.0     fog           30.0   30.0
      NaN      water         40.0   44.0
      1000.0   fog           NaN    NaN

我想加入第 1、2 和 3 列,以避免 NaN 值。 優先級將是 col1 中的值(如果有)。 那么優先級將是“col2”,最后是“col3”。

我嘗試執行以下代碼:

      df['new_column'] = df.ffill(axis=1)['col3']

輸出是:

      col1     sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       water
       11.2     strain          NaN     2.0       2.0
       10.0     fog             30.0    30.0      30.0
       NaN      water           40.0    44.0      44.0
       NaN      fog             NaN     NaN       fog

但是,所需的輸出是:

       col1    sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       NaN
       11.2     strain          NaN     2.0       11.2
       10.0     fog             30.0    30.0      10.0
       NaN      water           40.0    44.0      40.0
       1000.0   fog             NaN     NaN       1000.0

您可以實現的眾多方法之一是使用pandas.apply函數。

import numpy as np
import pandas as pd

df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

print(df)

def apply_func(row):
  if not pd.isna(row['col1']):
    return row['col1']
  elif not pd.isna(row['col2']):
    return row['col2']
  return row["col3"]

df["new_cols"]=df.apply(apply_func,axis=1)
print(df)

輸出:

     col1 sensor_name  col2  col3  new_cols
0     NaN       water   NaN   NaN       NaN
1    11.2      strain   NaN   2.0      11.2
2    10.0         fog  30.0  30.0      10.0
3     NaN       water  40.0  44.0      40.0
4  1000.0         fog   NaN   NaN    1000.0

過濾列

df['new'] = df.filter(like='col').bfill(1)['col1']
df
Out[324]: 
     col1 sensor_name  col2  col3     new
0     NaN       water   NaN   NaN     NaN
1    11.2      strain   NaN   2.0    11.2
2    10.0         fog  30.0  30.0    10.0
3     NaN       water  40.0  44.0    40.0
4  1000.0         fog   NaN   NaN  1000.0

嘗試這個:

f = df.filter(regex='col\d')
res = df.assign(new_column=f.where(f.notnull().cumsum(axis=1).eq(1)).max(axis=1))
print(res)
>>>
    col1    sensor_name col2    col3    new_column
0   NaN     water       NaN     NaN     NaN
1   11.2    strain      NaN     2.0     11.2
2   10.0    fog         30.0    30.0    10.0
3   NaN     water       40.0    44.0    40.0
4   1000.0  fog         NaN     NaN     1000.0
df['new_column'] = df['col1']
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,2]
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,3]

如果您有更多列要考慮和/或希望它更加自動化,您可以創建一個 for 循環。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM