简体   繁体   English

根据条件在另一个 df 的 df 子集中填充 NaN

[英]Filling NaNs in a subset of a df from another df based on a condition

Given a reference table fallback :给定一个参考表fallback

    Morning     Afternoon   Evening
Red     4         6.0           13
Blue    7         NaN           9
Green   9         1.0           2

and a data.table players :和一个 data.table players

    Player  Morning     Afternoon   Evening     Team    Total
0   Bill    4.0             NaN      13.0       Red     17.0
1   Emma    NaN             NaN      NaN        Blue    0.0
2   Mike    NaN             1.0      NaN        Green   1.0
3   Jill    NaN             NaN      NaN        Red     0.0

I would like to fill NaN data in players according to the following rule: for a player missing data in all three of Morning, Afternoon, Evening (ie whose Total is zero), fill those three columns from the data in fallback matching their Team .我想根据以下规则在players中填充NaN数据:对于缺少Morning, Afternoon, Evening这三个数据的玩家(即Total为零),从匹配他们的Teamfallback数据中填充这三列。 Desired outcome:期望的结果:

    Player  Morning     Afternoon   Evening     Team
0   Bill    4.0            NaN      13.0        Red
1   Emma    7.0            NaN      9.0         Blue
2   Mike    NaN            1.0      NaN         Green
3   Jill    4.0            6.0      13.0        Red

Code to generate sample data:生成示例数据的代码:

fallback = pd.DataFrame(
    {
        'Morning': [4, 7, 9],
        'Afternoon': [6, np.NaN, 1],
        'Evening': [13, 9, 2]
    },
    index=['Red', 'Blue', 'Green'])

players = pd.DataFrame({
    'Player': ['Bill', 'Emma', 'Mike', 'Jill'],
    'Morning': [4, np.NaN, np.NaN, np.NaN],
    'Afternoon': [np.NaN, np.NaN, 1, np.NaN],
    'Evening': [13, np.NaN, np.NaN, np.NaN],
    'Team': ['Red', 'Blue', 'Green', 'Red']
})
players['Total'] = players[['Morning', 'Afternoon', 'Evening']].sum(1)

outcome = pd.DataFrame({
    'Player': ['Bill', 'Emma', 'Mike', 'Jill'],
    'Morning': [4, 7, np.NaN, 4],
    'Afternoon': [np.NaN, np.NaN, 1, 6],
    'Evening': [13, 9, np.NaN, 13],
    'Team': ['Red', 'Blue', 'Green', 'Red']
})

Use DataFrame.combine_first by Team created by convert column Team to index by condition - tested missing values with DataFrame.all :使用DataFrame.combine_firstTeam创建,通过将Team列转换为按条件index - 使用DataFrame.all测试缺失值:

df = players.set_index('Team')
m = df[['Morning','Afternoon','Evening']].isna().all(axis=1)

df[m] = df[m].combine_first(fallback)
players = df.reset_index().reindex(players.columns, axis=1)
print (players)
  Player  Morning  Afternoon  Evening   Team  Total
0   Bill      4.0        NaN     13.0    Red   17.0
1   Emma      7.0        NaN      9.0   Blue    0.0
2   Mike      NaN        1.0      NaN  Green    1.0
3   Jill      4.0        6.0     13.0    Red    0.0

We can do slice the with all and isna , then change the fallback to the target row index then update我们可以用allisna对 the 进行切片,然后将回退更改为目标行索引,然后update

player2 = player[player[['Morning','Afternoon','Evening']].isna().all(1)]
fallback = fallback.reindex(player2.Team).reset_index()
fallback.index = player2.index
player.update(fallback)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM