根據其他兩列中的字符串創建pandas數據框列

Question

我有一個看起來像這樣的數據框：

boat_type   boat_type_2
Not Known   Not Known
Not Known   kayak
ship        Not Known
Not Known   Not Known
ship        Not Known

我想創建第三列boat_type_final ，其外觀應如下所示：

boat_type   boat_type_2  boat_type_final
Not Known   Not Known    cruise
Not Known   kayak        kayak
ship        Not Known    ship  
Not Known   Not Known    cruise
ship        Not Known    ship

因此，基本上，如果boat_type和boat_type_2中都boat_type boat_type_2 ，則該值應為“巡航”。 但是，如果在前兩列中有除“ boat_type_final ”以外的字符串，那么boat_type_final應該用該字符串填充，即“ kayak”或“ ship”。

最優雅的方法是什么？ 我已經看過幾個選項，例如where ，創建函數和/或邏輯，而且我想知道真正的pythonista會做什么。

到目前為止，這是我的代碼：

import pandas as pd
import numpy as np
data = [{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'kayak'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'Not Known'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'}]
df = pd.DataFrame(data
df['phone_type_final'] = np.where(df.phone_type.str.contains('Not'))...

Answer 1

采用：

df['boat_type_final'] = (df.replace('Not Known',np.nan)
                           .ffill(axis=1)
                           .iloc[:, -1]
                           .fillna('cruise'))
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

說明：

首先replace Not Known replace為缺失值：

print (df.replace('Not Known',np.nan))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship         NaN
3       NaN         NaN
4      ship         NaN

然后通過向前填充每行來替換NaN ：

print (df.replace('Not Known',np.nan).ffill(axis=1))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship        ship
3       NaN         NaN
4      ship        ship

通過iloc按位置選擇最后一列：

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1])
0      NaN
1    kayak
2     ship
3      NaN
4     ship
Name: boat_type_2, dtype: object

如果可能的話， NaN添加fillna ：

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1].fillna('cruise'))
0    cruise
1     kayak
2      ship
3    cruise
4      ship
Name: boat_type_2, dtype: object

如果只有幾列的另一種解決方案是使用numpy.select ：

m1 = df['boat_type'] == 'ship'
m2 = df['boat_type_2'] == 'kayak'

df['boat_type_final'] = np.select([m1, m2], ['ship','kayak'], default='cruise')
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

Answer 2

另一種解決方案是在具有映射的位置定義函數：

def my_func(row):
    if row['boat_type']!='Not Known':
        return row['boat_type']
    elif row['boat_type_2']!='Not Known':
        return row['boat_type_2']
    else: 
        return 'cruise'

[注意：您沒有提到當兩列都不為'Unknown'時會發生什么。]

然后只需應用函數：

df.loc[:,'boat_type_final'] = df.apply(my_func, axis=1)

print(df)

輸出：

   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

根據其他兩列中的字符串創建pandas數據框列

問題描述

2 個解決方案

解決方案1
4 已采納 2018-07-25 09:45:36

解決方案2
2 2018-07-25 10:25:18

根據其他兩列中的字符串創建pandas數據框列

問題描述

2 個解決方案

解決方案1 4 已采納 2018-07-25 09:45:36

解決方案2 2 2018-07-25 10:25:18

解決方案1
4 已采納 2018-07-25 09:45:36

解決方案2
2 2018-07-25 10:25:18