[英]How to update pandas's column if they have the same columns's value?
Let's say, I have two original DataFrame like: 比方说,我有两个原始的DataFrame:
df1 = pd.DataFrame({"ID": [101, 102, 103], "Price":[12, 33, 44], "something":[12,22,11]})
df2 = pd.DataFrame({"ID": [101, 103], "Price":[122, 133]})
And it display like: 它显示如下:
ID Price something
0 101 12 12
1 102 33 22
2 103 44 11
And 和
ID Price
0 101 122
1 103 133
Since, I don't set any index for any column, so I want to know how can I update the df1
if both DataFrame have the same ID
. 因为,我没有为任何列设置任何索引,所以我想知道如果两个DataFrame具有相同的
ID
,我如何更新df1
。 For this sample, I hope I can get the result like: 对于这个样本,我希望我能得到如下结果:
ID Price something
0 101 122 12
1 102 33 22
2 103 133 11
You can see, I only care about the price column. 你可以看到,我只关心价格栏。 What I have tried for now:
我现在尝试过的:
pd.concat([df1,df2]).drop_duplicates(['ID'],keep='last')
But it just show me: 但它只是告诉我:
ID Price something
1 102 33 22.0
0 101 122 NaN
1 103 133 NaN
I don't want any other columns value are being changed. 我不希望任何其他列值被更改。
I'd like to keep the order of the rows of df1
. 我想保持
df1
行的顺序。
UPDATE UPDATE
After run the answer code, and I keep trying more, and I find the order of the columns will change, since we use reset_index
, something about index. 运行答案代码后,我会继续尝试更多,并且我发现列的顺序会发生变化,因为我们使用了
reset_index
,这是关于索引的。 so I hope someone can point me how to keep the original position of my DataFrame. 所以我希望有人可以指出我如何保持我的DataFrame的原始位置。 For now, it look like below:
现在,它看起来如下:
In [180]: df1 = pd.DataFrame({"ss":[12,22,11], "ID": [101, 102, 103], "Price":[12, 33, 44], "something":[12,22,11]})
...: df2 = pd.DataFrame({"ID": [101, 103], "Price":[122, 133]})
In [181]: df1.set_index('ID',inplace=True)
...: df1.update(df2.set_index('ID'))
...: df1.reset_index(inplace=True)
In [182]: df1
Out[182]:
ID ss Price something
0 101 12 122.0 12
1 102 22 33.0 22
2 103 11 133.0 11
Using np.where
and isin
update your price in df1 after merge
使用
np.where
和isin
后更新价格DF1 merge
df1.Price=np.where(df1.ID.isin(df2.ID),df1.merge(df2,on='ID',how='left')['Price_y'],df1.Price)
df1
ID Price something
0 101 122.0 12
1 102 33.0 22
2 103 133.0 11
Using update
: 使用
update
:
df1.set_index('ID',inplace=True)
df1.update(df2.set_index('ID'))
df1.reset_index(inplace=True)
df1
ID Price something
0 101 122.0 12
1 102 33.0 22
2 103 133.0 11
Another possible solution could be using combine_first() 另一种可能的解决方案是使用combine_first()
df2.set_index(['ID']).combine_first(df1.set_index(['ID', 'something'])).reset_index()
And also by using isin() 并且还使用isin()
df1.loc[df1.ID.isin(df2.ID), ['Price']] = df2[['Price']].values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.