pandas combine_first with particular index columns?

Question

I'm trying to join two dataframes in pandas to have the following behavior: I want to join on a specified column, but have it so redundant columns are not added to the dataframe. This is analogous to combine_first except combine_first does not seem to take an index column optional argument. Example:

# combine df1 and df2 based on "id" column
df1 = pandas.merge(df2, how="outer", on=["id"])

The problem with the above is that columns common to df1/df2 aside from "id" will be added twice (with _x,_y prefixes) to df1. How can I do something like:

# Do outer join from df2 to df1, matching items by "id" but not adding
# columns that are redundant (df1 takes precedence if the values disagree)
df1.combine_first(df2, on=["id"])

How can this be done?

Answer 1

If you are trying to merge columns from df2 into df1 while excluding any redundant columns, the following should work.

df1.set_index("id", inplace=True)
df2.set_index("id", inplace=True)
df3 = df1.merge(df2.ix[:,df2.columns-df1.columns], left_index=True, right_index=True, how="outer")

However this obviously will not update any values from df1 with values from df2 as it is only bringing in non-redundant columns. But since you said df1 will take precedence on any values that disagree, perhaps this will do the trick?

pandas combine_first with particular index columns?

Question

1 answers

solution1
1 ACCPTED 2013-03-28 01:43:55

pandas combine_first with particular index columns?

Question

1 answers

solution1 1 ACCPTED 2013-03-28 01:43:55

solution1
1 ACCPTED 2013-03-28 01:43:55