Python Pandas使用基于另一个具有重叠索引的数据框中的列的值来更新数据框列

Question

This probably has a straightforward answer but somehow I'm not seeing it. 这可能有一个简单的答案，但是以某种方式我看不到它。

I have two dataframes df_a and df_b . 我有两个数据df_a和df_b 。 df_b.index is a subset of df_a.index . df_b.index的一个子集df_a.index 。

df_a

              Actioncode   Group

    Mary         1.0         I
    Paul         1.0         I
    Robert       4.0         O
    David        4.0         O
    Julia        4.0         O

Note that Group pertains to an ActionCode (Just makes the actioncode readable. 请注意， Group与一个ActionCode （只需使该Actioncode可读即可。

df_b

              Group

    Paul        O
    Robert      I

What i want is df_a Actioncode to show 5.0 if the name is in df_b and Group is 'O' and df_a Actioncode to show 3.0 if the name is in df_b and Group is 'I'. 我要的是df_a Actioncode显示5.0如果名称是df_b和Group是“O”和df_a Actioncode显示3.0如果名称是df_b和Group是“我”。

So the result would be: 因此结果将是：

    df_a

              Actioncode   Group

    Mary         1.0         I
    Paul         5.0         I
    Robert       3.0         O
    David        4.0         O
    Julia        4.0         O

I've tried where but can't seem to get it. 我在where尝试过where但似乎无法理解。

df_a['Actioncode'] =  df_a['Actioncode'].where(df_b['Group'] == 'O', 5.0)

But it's not quite right. 但这并不完全正确。

I can iterate but it's not pythonic. 我可以迭代，但不是pythonic。

Insights? 见解？

Thanks, 谢谢，

Answer 1

You can use np.select for this, which works like np.where but with multiple conditions / outputs: 您可以为此使用np.select ，它的工作方式类似于np.where但是具有多个条件/输出：

# Transform index of df_a to series for mapping
a_idx = df_a.index.to_series()

# Condition that df_a's index is in df_b
idx_in = a_idx.isin(df_b.index)

# map df_a's index to the df_b groups
mapped = a_idx.map(df_b.Group)

# apply np.select on your conditions:
conds = [(idx_in) & (mapped == 'O'),
         (idx_in) & (mapped == 'I')]

choices = [5,3]


df_a['Actioncode'] = np.select(conds,choices, df_a.Actioncode)

>>> df_a
        Actioncode Group
Mary           1.0     I
Paul           5.0     I
Robert         3.0     O
David          4.0     O
Julia          4.0     O

Answer 2

Another option with np.where and mapping. np.where和映射的另一个选项。

scores = pd.Series(df_a.index).map(df_b['Group'].map({'O': 5.0, 'I': 3.0}))
df_a['Actioncode'] = np.where(scores.isnull(), df_a['Actioncode'], scores)

Details: 细节：

>>> df_a
        Actioncode Group
Mary           1.0     I
Paul           1.0     I
Robert         4.0     O
David          4.0     O
Julia          4.0     O
>>> scores = pd.Series(df_a.index).map(df_b['Group'].map({'O': 5.0, 'I': 3.0}))
>>> scores
0    NaN
1    5.0
2    3.0
3    NaN
4    NaN
dtype: float64
>>> 
>>> where = np.where(scores.isnull(), df_a['Actioncode'], scores)
>>> where
array([1., 5., 3., 4., 4.])
>>>
>>> df_a['Actioncode'] = where
>>> df_a
        Actioncode Group
Mary           1.0     I
Paul           5.0     I
Robert         3.0     O
David          4.0     O
Julia          4.0     O

Python Pandas使用基于另一个具有重叠索引的数据框中的列的值来更新数据框列

问题描述

2 个解决方案

解决方案1
2 2018-11-27 19:45:37

解决方案2
2 2018-11-27 20:17:57

Python Pandas使用基于另一个具有重叠索引的数据框中的列的值来更新数据框列

问题描述

2 个解决方案

解决方案1 2 2018-11-27 19:45:37

解决方案2 2 2018-11-27 20:17:57

解决方案1
2 2018-11-27 19:45:37

解决方案2
2 2018-11-27 20:17:57