Let's say that I have two dataframes df1
and df2
. I can do an inner and an outer join in this way:
inner_df = df1.merge(df2, how="inner", left_on=col_df1, right_on=col_df2)
outer_df = df1.merge(df2, how="outer", left_on=col_df1, right_on=col_df2)
The DataFrame.merge
method allows you to use an indicator
attribute: if True, a column is added to output DataFrame called "_merge" with information on the source of each row. This column takes on a value of “left_only” for observations whose merge key only appears in 'left' DataFrame, "right_only" for observations whose merge key only appears in 'right' DataFrame, and "both" if the observation's merge key is found in both.
I am not sure if I understood correctly what this attribute does. Here is my question: are these two pieces of code equivalent?
inner_df = df1.merge(df2, how="inner", left_on=col_df1, right_on=col_df2)
outer_df = df1.merge(df2, how="outer", left_on=col_df1, right_on=col_df2,
indicator=True)
inner_df = outer_df[outer_df['_merge'] == 'both'].drop(columns=["_merge"])
The two merges return the same rows . But not exactly the same dataframes. The differences are:
inner_df2
has an additional column _merge
column - ok if is trivial to get rid of it with ...drop(columns='_merge')
Long story short: whether both are equivalent actually depend on the real use case...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.