[英]Python Pandas Update a Dataframe Column with a Value based on a Column in Another Dataframe with Overlapping Indices
This probably has a straightforward answer but somehow I'm not seeing it. 这可能有一个简单的答案,但是以某种方式我看不到它。
I have two dataframes df_a
and df_b
. 我有两个数据
df_a
和df_b
。 df_b.index
is a subset of df_a.index
. df_b.index
的一个子集df_a.index
。
df_a
Actioncode Group
Mary 1.0 I
Paul 1.0 I
Robert 4.0 O
David 4.0 O
Julia 4.0 O
Note that Group
pertains to an ActionCode
(Just makes the actioncode readable. 请注意,
Group
与一个ActionCode
(只需使该Actioncode可读即可。
df_b
Group
Paul O
Robert I
What i want is df_a
Actioncode
to show 5.0 if the name is in df_b
and Group
is 'O' and df_a
Actioncode
to show 3.0 if the name is in df_b
and Group
is 'I'. 我要的是
df_a
Actioncode
显示5.0如果名称是df_b
和Group
是“O”和df_a
Actioncode
显示3.0如果名称是df_b
和Group
是“我”。
So the result would be: 因此结果将是:
df_a
Actioncode Group
Mary 1.0 I
Paul 5.0 I
Robert 3.0 O
David 4.0 O
Julia 4.0 O
I've tried where
but can't seem to get it. 我在
where
尝试过where
但似乎无法理解。
df_a['Actioncode'] = df_a['Actioncode'].where(df_b['Group'] == 'O', 5.0)
But it's not quite right. 但这并不完全正确。
I can iterate but it's not pythonic. 我可以迭代,但不是pythonic。
Insights? 见解?
Thanks, 谢谢,
You can use np.select
for this, which works like np.where
but with multiple conditions / outputs: 您可以为此使用
np.select
,它的工作方式类似于np.where
但是具有多个条件/输出:
# Transform index of df_a to series for mapping
a_idx = df_a.index.to_series()
# Condition that df_a's index is in df_b
idx_in = a_idx.isin(df_b.index)
# map df_a's index to the df_b groups
mapped = a_idx.map(df_b.Group)
# apply np.select on your conditions:
conds = [(idx_in) & (mapped == 'O'),
(idx_in) & (mapped == 'I')]
choices = [5,3]
df_a['Actioncode'] = np.select(conds,choices, df_a.Actioncode)
>>> df_a
Actioncode Group
Mary 1.0 I
Paul 5.0 I
Robert 3.0 O
David 4.0 O
Julia 4.0 O
Another option with np.where
and mapping. np.where
和映射的另一个选项。
scores = pd.Series(df_a.index).map(df_b['Group'].map({'O': 5.0, 'I': 3.0}))
df_a['Actioncode'] = np.where(scores.isnull(), df_a['Actioncode'], scores)
Details: 细节:
>>> df_a
Actioncode Group
Mary 1.0 I
Paul 1.0 I
Robert 4.0 O
David 4.0 O
Julia 4.0 O
>>> scores = pd.Series(df_a.index).map(df_b['Group'].map({'O': 5.0, 'I': 3.0}))
>>> scores
0 NaN
1 5.0
2 3.0
3 NaN
4 NaN
dtype: float64
>>>
>>> where = np.where(scores.isnull(), df_a['Actioncode'], scores)
>>> where
array([1., 5., 3., 4., 4.])
>>>
>>> df_a['Actioncode'] = where
>>> df_a
Actioncode Group
Mary 1.0 I
Paul 5.0 I
Robert 3.0 O
David 4.0 O
Julia 4.0 O
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.