[英]Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas
[英]Pandas - Create a column based on values from 2 other columns
我正在嘗試解決有關Pandas的問題,但是我不確定從哪里開始。
我有一個包含多列的數據框,但是對此問題感興趣的是這樣的:
df = pd.DataFrame(data = {'subject': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'val': [np.nan, 2, np.nan, np.nan, np.nan, 7, np.nan, np.nan, 10]})
subject val
0 1 NaN
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 7.0
6 3 NaN
7 3 NaN
8 3 10.0
我想創建第三列,對於每個主題,它在val列上具有對應主題的值:
subject val total
0 1 NaN 2
1 1 2.0 2
2 1 NaN 2
3 2 NaN 7
4 2 NaN 7
5 2 7.0 7
6 3 NaN 10
7 3 NaN 10
8 3 10.0 10
我知道我能做
df[['subject', 'val']].dropna()
獲取第三列的值,但這會丟失數據幀中的所有其他列(每行中的值都不同)。
謝謝
嘗試這個,
df['total'] =df.groupby('subject')['val'].transform('sum')
要么
df['total2'] =df.groupby('subject')['val'].transform(lambda x:x[x.notnull()].unique()) #this will remove NaN records and give you unique element in each group
輸出:
subject val total total2
0 1 NaN 2.0 2.0
1 1 2.0 2.0 2.0
2 1 NaN 2.0 2.0
3 2 NaN 7.0 7.0
4 2 NaN 7.0 7.0
5 2 7.0 7.0 7.0
6 3 NaN 10.0 10.0
7 3 NaN 10.0 10.0
8 3 10.0 10.0 10.0
通過使用ffill
和bfill
df['New']=df.groupby('subject').val.apply(lambda x : x.ffill().bfill())
df
Out[257]:
subject val New
0 1 NaN 2.0
1 1 2.0 2.0
2 1 NaN 2.0
3 2 NaN 7.0
4 2 NaN 7.0
5 2 7.0 7.0
6 3 NaN 10.0
7 3 NaN 10.0
8 3 10.0 10.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.