[英]How to merge and update pandas dataframes
I'm sorry if this has been asked before but I wasn't sure how to word this question into a search.如果之前有人问过这个问题,我很抱歉,但我不确定如何在搜索中表达这个问题。
I have 2 data frames with a year column and value column.我有 2 个带有年份列和值列的数据框。 I want to udpate the first data frame based on matching the year and update the value column according to which value is larger.
我想根据匹配年份更新第一个数据框,并根据哪个值更大来更新值列。 Suppose the data frames look like this
假设数据框看起来像这样
>>> import pandas as pd
>>> x = [1999, 2000, 2001]
>>> y = [0, 0, 0]
>>> df1 = pd.DataFrame({'year': x, 'value': y})
>>> df1
year value
0 1999 0
1 2000 0
2 2001 0
>>> x2 = [1999, 2003, 2004]
>>> y2 = [5, 0, 0]
>>> df2 = pd.DataFrame({'year': x2, 'value': y2})
>>> df2
year value
0 1999 5
1 2003 0
2 2004 0
I want the updated data frame ( df1
) to look this.我希望更新的数据框(
df1
)看起来像这样。 Is there a simple way to do this?有没有一种简单的方法可以做到这一点?
year value
0 1999 5
1 2000 0
2 2001 0
Using merge
and map
:使用
merge
和map
:
df = df1.merge(df2, on=['year'], how='outer')
df['max'] = df.filter(like='value').max(1)
df1['value'] = df1['year'].map(df.set_index('year')['max'])
print(df1)
year value
0 1999 5.0
1 2000 0.0
2 2001 0.0
EDIT : To know which rows are changed use:编辑:要知道更改了哪些行,请使用:
#intialize the `value` column to `temp` column
df1['temp'] = df1['value']
#now use the above code to change the `value` column
#check which rows are changed with respect to `temp` column
df1['Changed_Values'] = df1['temp'].ne(df1['value'])
#finally drop temporary column
df1.drop('temp', axis=1, inplace=True)
Why not just do:为什么不这样做:
if df1.value.sum()<df2.value.sum():
df1.value = df2.value
Or:或者:
if df1['value'].sum()<df2['value'].sum():
df1['value'] = df2['value']
Now:现在:
print(df1)
Is:是:
year value
0 1999 5
1 2000 0
2 2001 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.