[英]Select rows from a DataFrame based on a values in another dataframe and updating one of the column with values according to the second DataFrame
I have two Dataframes df and df1. 我有两个数据帧df和df1。
Main DataFrame is as follows: 主DataFrame如下:
DF: DF:
start end price
0 A Z 1
1 B Y 2
2 C X 3
3 A Z 4
4 D W 5
Second DataFrame: 第二个DataFrame:
DF1: DF1:
start end price
0 A Z 100
1 B Y 200
I want the main dataframe df to update the values in 'price' columns based on the start and end in df1. 我希望主数据帧df根据df1中的开头和结尾更新'price'列中的值。 it should update column value for all the rows having the same start and end as in df1.
它应该更新具有与df1相同的开始和结束的所有行的列值。 DF:
DF:
start end price
0 A Z 100
1 B Y 200
2 C X 3
3 A Z 100
4 D W 5
(all AZ and BY in df should get updated). (df中的所有AZ和BY都应该更新)。 Is there anyway I can get this output ?
无论如何我能得到这个输出吗? In reality the datframes have more columns but I want to update only one column(eg.'Price').
实际上,数据帧有更多列,但我想只更新一列(例如''价格')。
First, you can merge: 首先,您可以合并:
s = df1.merge(df2, left_on=['start', 'end'], right_on=['start', 'end'], how='left')
Then you can fillna
and index your desired columns: 然后,您可以
fillna
并索引所需的列:
s.assign(price=s.price_y.fillna(s.price_x))[['start', 'end', 'price']]
start end price
0 A Z 100.0
1 B Y 200.0
2 C X 3.0
3 A Z 100.0
4 D W 5.0
Using update
使用
update
df=df.set_index(['start','end'])
df.update(df1.set_index(['start','end']))
df.reset_index()
Out[99]:
start end price
0 A Z 100.0
1 B Y 200.0
2 C X 3.0
3 A Z 100.0
4 D W 5.0
merge
df.drop('price', 1).merge(df1, 'left').fillna(df)
start end price
0 A Z 100.0
1 B Y 200.0
2 C X 3.0
3 A Z 100.0
4 D W 5.0
['start', 'end']
and that pesky price
is going to get in my way. ['start', 'end']
上合并,那个讨厌的price
会妨碍我。 So, I drop it. df
index because I have that repeat of 'A'
and 'Z'
. df
索引,因为我重复了'A'
和'Z'
。 So, I use a 'left'
merge
'left'
merge
df
df
填充
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.