简体   繁体   中英

Updating dataframe column with values from a different dataframe

I have two dataframes and I want to update one of the dataframe based on matching id's from the second one, but only on 1 column. Both dataframes contain other columns that I am not concerned about and should not be updated.

For example:

df1

id    name  ...
123   city a
456   city b
789   city c
789   city c
456   city b
123   city a
.
.
.

so on and so forth

df2

id    name  ...
123   City A
456   City B
789   City C
.
.
.

So the resulting df_new should be:

id    name  ...
123   City A
456   City B
789   City C
789   City C
456   City B
123   City A
.
.
.

Right now what I'm doing is:

df2 = df2.set_index('id')
df1 = df.set_index('id')
df1.update(df2)
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

So I tried to write this into a function like this:

def replace_names(df1, df2, 'id'):
    df2 = df2.set_index('id')
    df1 = df.set_index('id')
    df1.update(df2)
    df1.reset_index(inplace=True)
    df2.reset_index(inplace=True)
    return df

However, this seems to return a dataframe of NoneType when used as a function.

I would also like to change this to a method where instead of reindexing two dataframes and resetting them, if there is a way to pick an id and names column on two dataframes as variables and just apply a function like this only on those two columns, without updating or changing anything else in the rest of the columns on either dataframes.

Any help would be appreciated!

Change your function to:

def replace_names(df1, df2, idxCol = 'id', srcCol = 'name', dstCol = 'name'):
    df1 = df1.set_index(idxCol)
    df1[dstCol].update(df2.set_index(idxCol)[srcCol])
    return df1.reset_index()

Parameters:

  • df1 - the DataFrame with column to be updated.
  • df2 - the DataFrame with the source column.
  • idxCol - name of the "join" column.
  • srcCol - name of the source column (in df2 ).
  • dstCol - name of the destination column (in df1 ).

Then call it:

df_new = replace_names(df1, df2)

Your code failed because:

  • In df1 = df.set_index('id') you refer to df (an external variable, despite having both DataFrames specified as parameters).
  • In df1.update(df2) you update all columns, whereas you want to update only one column.
  • Your function returned df - a variable without any connection to what this function did so far (maybe it was an empty DataFrame).
   final= df1.merge(df2, on=['id'],how='inner')

then you can drop column that which is not required

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM