简体   繁体   中英

Python Pandas Update a Dataframe Column with a Value based on a Column in Another Dataframe with Overlapping Indices

This probably has a straightforward answer but somehow I'm not seeing it.

I have two dataframes df_a and df_b . df_b.index is a subset of df_a.index .

df_a

              Actioncode   Group

    Mary         1.0         I
    Paul         1.0         I
    Robert       4.0         O
    David        4.0         O
    Julia        4.0         O

Note that Group pertains to an ActionCode (Just makes the actioncode readable.

df_b

              Group

    Paul        O
    Robert      I

What i want is df_a Actioncode to show 5.0 if the name is in df_b and Group is 'O' and df_a Actioncode to show 3.0 if the name is in df_b and Group is 'I'.

So the result would be:

    df_a

              Actioncode   Group

    Mary         1.0         I
    Paul         5.0         I
    Robert       3.0         O
    David        4.0         O
    Julia        4.0         O

I've tried where but can't seem to get it.

df_a['Actioncode'] =  df_a['Actioncode'].where(df_b['Group'] == 'O', 5.0)

But it's not quite right.

I can iterate but it's not pythonic.

Insights?

Thanks,

You can use np.select for this, which works like np.where but with multiple conditions / outputs:

# Transform index of df_a to series for mapping
a_idx = df_a.index.to_series()

# Condition that df_a's index is in df_b
idx_in = a_idx.isin(df_b.index)

# map df_a's index to the df_b groups
mapped = a_idx.map(df_b.Group)

# apply np.select on your conditions:
conds = [(idx_in) & (mapped == 'O'),
         (idx_in) & (mapped == 'I')]

choices = [5,3]


df_a['Actioncode'] = np.select(conds,choices, df_a.Actioncode)

>>> df_a
        Actioncode Group
Mary           1.0     I
Paul           5.0     I
Robert         3.0     O
David          4.0     O
Julia          4.0     O

Another option with np.where and mapping.

scores = pd.Series(df_a.index).map(df_b['Group'].map({'O': 5.0, 'I': 3.0}))
df_a['Actioncode'] = np.where(scores.isnull(), df_a['Actioncode'], scores)

Details:

>>> df_a
        Actioncode Group
Mary           1.0     I
Paul           1.0     I
Robert         4.0     O
David          4.0     O
Julia          4.0     O
>>> scores = pd.Series(df_a.index).map(df_b['Group'].map({'O': 5.0, 'I': 3.0}))
>>> scores
0    NaN
1    5.0
2    3.0
3    NaN
4    NaN
dtype: float64
>>> 
>>> where = np.where(scores.isnull(), df_a['Actioncode'], scores)
>>> where
array([1., 5., 3., 4., 4.])
>>>
>>> df_a['Actioncode'] = where
>>> df_a
        Actioncode Group
Mary           1.0     I
Paul           5.0     I
Robert         3.0     O
David          4.0     O
Julia          4.0     O

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM