简体   繁体   English

如何合并和更新熊猫数据框

[英]How to merge and update pandas dataframes

I'm sorry if this has been asked before but I wasn't sure how to word this question into a search.如果之前有人问过这个问题,我很抱歉,但我不确定如何在搜索中表达这个问题。

I have 2 data frames with a year column and value column.我有 2 个带有年份列和值列的数据框。 I want to udpate the first data frame based on matching the year and update the value column according to which value is larger.我想根据匹配年份更新第一个数据框,并根据哪个值更大来更新值列。 Suppose the data frames look like this假设数据框看起来像这样

>>> import pandas as pd
>>> x = [1999, 2000, 2001]
>>> y = [0, 0, 0]
>>> df1 = pd.DataFrame({'year': x, 'value': y})
>>> df1

   year   value
0  1999   0
1  2000   0
2  2001   0

>>> x2 = [1999, 2003, 2004]
>>> y2 = [5, 0, 0]
>>> df2 = pd.DataFrame({'year': x2, 'value': y2})
>>> df2

   year   value
0  1999   5
1  2003   0
2  2004   0

I want the updated data frame ( df1 ) to look this.我希望更新的数据框( df1 )看起来像这样。 Is there a simple way to do this?有没有一种简单的方法可以做到这一点?

   year   value
0  1999   5
1  2000   0
2  2001   0

Using merge and map :使用mergemap

df = df1.merge(df2, on=['year'], how='outer')
df['max'] = df.filter(like='value').max(1)
df1['value'] = df1['year'].map(df.set_index('year')['max'])

print(df1)
   year  value
0  1999    5.0
1  2000    0.0
2  2001    0.0

EDIT : To know which rows are changed use:编辑:要知道更改了哪些行,请使用:

#intialize the `value` column to `temp` column
df1['temp'] = df1['value']
#now use the above code to change the `value` column
#check which rows are changed with respect to `temp` column
df1['Changed_Values'] = df1['temp'].ne(df1['value'])
#finally drop temporary column
df1.drop('temp', axis=1, inplace=True)

Why not just do:为什么不这样做:

if df1.value.sum()<df2.value.sum():
    df1.value = df2.value

Or:或者:

if df1['value'].sum()<df2['value'].sum():
    df1['value'] = df2['value']

Now:现在:

print(df1)

Is:是:

   year  value
0  1999      5
1  2000      0
2  2001      0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM