![](/img/trans.png)
[英]Pandas: How to create a new column in a Dataframe and add values in it considering other existing columns
[英]How to join columns in a pandas dataframe, with empty values, considering priority?
我正在使用 pandas 數據框從事大學工作,如下所示:
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
'col3': [np.NaN, 2, 30, 44, np.NaN]
})
print(df)
col1 sensor_name col2 col3
NaN water NaN NaN
11.2 strain NaN 2.0
10.0 fog 30.0 30.0
NaN water 40.0 44.0
1000.0 fog NaN NaN
我想加入第 1、2 和 3 列,以避免 NaN 值。 優先級將是 col1 中的值(如果有)。 那么優先級將是“col2”,最后是“col3”。
我嘗試執行以下代碼:
df['new_column'] = df.ffill(axis=1)['col3']
輸出是:
col1 sensor_name col2 col3 new_column
NaN water NaN NaN water
11.2 strain NaN 2.0 2.0
10.0 fog 30.0 30.0 30.0
NaN water 40.0 44.0 44.0
NaN fog NaN NaN fog
但是,所需的輸出是:
col1 sensor_name col2 col3 new_column
NaN water NaN NaN NaN
11.2 strain NaN 2.0 11.2
10.0 fog 30.0 30.0 10.0
NaN water 40.0 44.0 40.0
1000.0 fog NaN NaN 1000.0
您可以實現的眾多方法之一是使用pandas.apply
函數。
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
'col3': [np.NaN, 2, 30, 44, np.NaN]
})
print(df)
def apply_func(row):
if not pd.isna(row['col1']):
return row['col1']
elif not pd.isna(row['col2']):
return row['col2']
return row["col3"]
df["new_cols"]=df.apply(apply_func,axis=1)
print(df)
輸出:
col1 sensor_name col2 col3 new_cols
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
過濾列
df['new'] = df.filter(like='col').bfill(1)['col1']
df
Out[324]:
col1 sensor_name col2 col3 new
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
嘗試這個:
f = df.filter(regex='col\d')
res = df.assign(new_column=f.where(f.notnull().cumsum(axis=1).eq(1)).max(axis=1))
print(res)
>>>
col1 sensor_name col2 col3 new_column
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
df['new_column'] = df['col1']
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,2]
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,3]
如果您有更多列要考慮和/或希望它更加自動化,您可以創建一個 for 循環。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.