[英]Pandas: overlay a column into row with a blank one
我有一個看起來像這樣的數據框:
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 NaN
L21635789 SBAS02030 NaN A22810282
L21635789 SBAS03030 NaN A21721880
我正在嘗試將來自 manager2 的一行(與哪一行無關)“疊加”到包含 manager1 的行中,該行為空白/NaN,如下所示:
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 A22810282
L21635789 SBAS02030 NaN NaN
L21635789 SBAS03030 NaN NaN
或者
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 A21721880
L21635789 SBAS02030 NaN NaN
L21635789 SBAS03030 NaN NaN
顯然我們需要在 DCC3 上重新索引,但是然后呢? 它只需要覆蓋這 2 列(並且只有這些列存在)
我真的可以使用幫助,在此先感謝您。
對不起,我沒有澄清,這是一個基本案例。 在某些情況下,這只是一個值(這不適用),或最多 5-6。 我以 3 行為例。
您可以使用np.where
來完成此操作:
df['manager2'] = np.where(df['manager1'].notnull() & df['manager2'].isnull(),
df['manager2'].dropna().iloc[0], np.nan) # You could do df['manager2'].dropna().iloc[1] for the other value
df
Out[1]:
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 A22810282
L21635789 SBAS02030 NaN nan
L21635789 SBAS03030 NaN nan
這兩行代碼應該可以為您解決問題。
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
下面是我嘗試過的幾個場景,代碼是一樣的。 看看這是不是你想要的。
import pandas as pd
import numpy as np
c=['party_num','dcc3','manager1','manager2']
第 1 行:manager1 = NaN,manager2 = 值
結果:將 manager2 值分配給第 2 行
print ('\nScenario 1')
print ('row 1: manager 1: NaN, manager 2: value; pick row2 manager 1 value')
d = [['L21635789','SBAS01030',np.NaN,'A22810282'],
['L21635789','SBAS02030','A22677981',np.NaN],
['L21635789','SBAS03030',np.NaN,'A21721880']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
場景 1 的輸出:
Scenario 1
row 1: manager 1: NaN, manager 2: value; pick row2 manager 1 value
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN A22810282
1 L21635789 SBAS02030 A22677981 NaN
2 L21635789 SBAS03030 NaN A21721880
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 A22677981 A21721880
2 L21635789 SBAS03030 NaN NaN
第 1 行:manager1 = 值,manager2 = NaN
結果:將 manager2 值分配給第 1 行
print ('\nScenario 2')
print ('row 1: manager 1: value, manager 2: NaN; pick row2 manager 2 value')
d = [['L21635789','SBAS01030','A22677981',np.NaN],
['L21635789','SBAS02030',np.NaN,'A22810282'],
['L21635789','SBAS03030',np.NaN,'A21721880']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
場景 2 的輸出:
Scenario 2
row 1: manager 1: value, manager 2: NaN; pick row2 manager 2 value
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 A22677981 NaN
1 L21635789 SBAS02030 NaN A22810282
2 L21635789 SBAS03030 NaN A21721880
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 A22677981 A22810282
1 L21635789 SBAS02030 NaN NaN
2 L21635789 SBAS03030 NaN NaN
第 1 行:manager1 = NaN,manager2 = NaN
第 2 行:manager1 = 值; manager2 = NaN; 第 3 行:manager2 = 值
結果:將 manager3 值分配給第 2 行
print ('\nScenario 3')
print ('row 1: manager 1: NaN, manager 2: NaN; pick row2 manager 1 & row 3 manager 2')
d = [['L21635789','SBAS01030',np.NaN,np.NaN],
['L21635789','SBAS02030','A22677981',np.NaN],
['L21635789','SBAS03030',np.NaN,'A21721880']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
場景 3 的輸出:
Scenario 3
row 1: manager 1: NaN, manager 2: NaN; pick row2 manager 1 & row 3 manager 2
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 A22677981 NaN
2 L21635789 SBAS03030 NaN A21721880
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 A22677981 A21721880
2 L21635789 SBAS03030 NaN NaN
第 1 行:manager1 = 值,manager2 = NaN
第 3 行:經理 1 = 價值,經理 2 = 價值
結果:忽略第 1 行和第 2 行,因為第 3 行對 manager1 和 manager2 都有值
print ('\nScenario 4')
print ('row 1: manager 1: NaN, manager 2: value; row3 has both manager 1 & manager 2')
d = [['L21635789','SBAS01030',np.NaN,'A21721880'],
['L21635789','SBAS02030',np.NaN,np.NaN],
['L21635789','SBAS03030','A22677981','A21721882']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
場景 4 的輸出:
Scenario 4
row 1: manager 1: NaN, manager 2: value; row3 has both manager 1 & manager 2
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN A21721880
1 L21635789 SBAS02030 NaN NaN
2 L21635789 SBAS03030 A22677981 A21721882
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 NaN NaN
2 L21635789 SBAS03030 A22677981 A21721882
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.