[英]pandas set value in column based on another dataframe column
Imagine I have two pandas data frame as:想象一下,我有两个 pandas 数据框:
import pandas as pd
df1 = {'y1': [1, 2, 3, 4]}
df2 = {'y2': [3, 1, 2, 6]}
What I want is if a value in y2 is greater than the value in y1, I want to set df2['y2']
to the corresponding df['y1']
.我想要的是如果 y2 中的值大于 y1 中的值,我想将
df2['y2']
设置为相应的df['y1']
。 When I try selecting the corresponding columns like:当我尝试选择相应的列时,例如:
df2[df2['y2'] > df1['y1']]
This is returns True
rather than the index.这是返回
True
而不是索引。 I was hoping to do something like:我希望做类似的事情:
df2[df2['y2'] > df1['y1']]['y2'] = df1['y1']
If same index in both DataFrames:如果两个 DataFrame 中的索引相同:
Use DataFrame.loc
:使用
DataFrame.loc
:
df2.loc[df2['y2'] > df1['y1'], 'y2'] = df1['y1']
print (df2)
y2
0 1
1 1
2 2
3 4
Or Series.where
, Series.mask
:或
Series.where
, Series.mask
:
df2['y2'] = df1['y1'].where(df2['y2'].gt(df1['y1']), df2['y2'])
df2['y2'] = df2['y2'].mask(df2['y2'].gt(df1['y1']), df1['y1'])
print (df2)
y2
0 1
1 1
2 2
3 4
Use numpy.where
:使用
numpy.where
:
In [233]: import numpy as np
In [234]: df1 = pd.DataFrame({'y1': [1, 2, 3, 4]})
In [236]: df2 = pd.DataFrame({'y2': [3, 1, 2, 6]})
In [242]: df2['y2'] = np.where(df2.y2.gt(df1.y1), df1.y1, df2.y2)
In [243]: df2
Out[243]:
y2
0 1
1 1
2 2
3 4
np.minimum
Maintain all of existing df2
but with updated column values in 'y2'
维护所有现有的
df2
,但在'y2'
中更新列值
df2.assign(y2=np.minimum(df1.y1, df2.y2))
y2
0 1
1 1
2 2
3 4
Or just a new dataframe with one column或者只是一个带有一列的新 dataframe
pd.DataFrame({'y2': np.minimum(df1.y1, df2.y2)})
y2
0 1
1 1
2 2
3 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.