[英]Filling NaNs in a subset of a df from another df based on a condition
Given a reference table fallback
:给定一个参考表
fallback
:
Morning Afternoon Evening
Red 4 6.0 13
Blue 7 NaN 9
Green 9 1.0 2
and a data.table players
:和一个 data.table
players
:
Player Morning Afternoon Evening Team Total
0 Bill 4.0 NaN 13.0 Red 17.0
1 Emma NaN NaN NaN Blue 0.0
2 Mike NaN 1.0 NaN Green 1.0
3 Jill NaN NaN NaN Red 0.0
I would like to fill NaN
data in players
according to the following rule: for a player missing data in all three of Morning, Afternoon, Evening
(ie whose Total
is zero), fill those three columns from the data in fallback
matching their Team
.我想根据以下规则在
players
中填充NaN
数据:对于缺少Morning, Afternoon, Evening
这三个数据的玩家(即Total
为零),从匹配他们的Team
的fallback
数据中填充这三列。 Desired outcome:期望的结果:
Player Morning Afternoon Evening Team
0 Bill 4.0 NaN 13.0 Red
1 Emma 7.0 NaN 9.0 Blue
2 Mike NaN 1.0 NaN Green
3 Jill 4.0 6.0 13.0 Red
Code to generate sample data:生成示例数据的代码:
fallback = pd.DataFrame(
{
'Morning': [4, 7, 9],
'Afternoon': [6, np.NaN, 1],
'Evening': [13, 9, 2]
},
index=['Red', 'Blue', 'Green'])
players = pd.DataFrame({
'Player': ['Bill', 'Emma', 'Mike', 'Jill'],
'Morning': [4, np.NaN, np.NaN, np.NaN],
'Afternoon': [np.NaN, np.NaN, 1, np.NaN],
'Evening': [13, np.NaN, np.NaN, np.NaN],
'Team': ['Red', 'Blue', 'Green', 'Red']
})
players['Total'] = players[['Morning', 'Afternoon', 'Evening']].sum(1)
outcome = pd.DataFrame({
'Player': ['Bill', 'Emma', 'Mike', 'Jill'],
'Morning': [4, 7, np.NaN, 4],
'Afternoon': [np.NaN, np.NaN, 1, 6],
'Evening': [13, 9, np.NaN, 13],
'Team': ['Red', 'Blue', 'Green', 'Red']
})
Use DataFrame.combine_first
by Team
created by convert column Team
to index
by condition - tested missing values with DataFrame.all
:使用
DataFrame.combine_first
由Team
创建,通过将Team
列转换为按条件index
- 使用DataFrame.all
测试缺失值:
df = players.set_index('Team')
m = df[['Morning','Afternoon','Evening']].isna().all(axis=1)
df[m] = df[m].combine_first(fallback)
players = df.reset_index().reindex(players.columns, axis=1)
print (players)
Player Morning Afternoon Evening Team Total
0 Bill 4.0 NaN 13.0 Red 17.0
1 Emma 7.0 NaN 9.0 Blue 0.0
2 Mike NaN 1.0 NaN Green 1.0
3 Jill 4.0 6.0 13.0 Red 0.0
We can do slice the with all
and isna
, then change the fallback to the target row index then update
我们可以用
all
和isna
对 the 进行切片,然后将回退更改为目标行索引,然后update
player2 = player[player[['Morning','Afternoon','Evening']].isna().all(1)]
fallback = fallback.reindex(player2.Team).reset_index()
fallback.index = player2.index
player.update(fallback)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.