简体   繁体   中英

Merging dataframes based on index

How can I merge 2 dataframe df1 and df2 in order to get df3 that has the rows of df1 and df2 that have the same index (and the same values in the columns)?

df1 = pd.DataFrame({'A': ['A0', 'A2', 'A3', 'A7'],
                        'B': ['B0', 'B2', 'B3', 'B7'],
                        'C': ['C0', 'C2', 'C3', 'C7'],
                        'D': ['D0', 'D2', 'D3', 'D7']},
                         index=[0, 2, 3,7])

test 1

df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A7'],
                    'B': ['B0', 'B1', 'B2', 'B7'],
                    'C': ['C0', 'C1', 'C2', 'C7'],
                    'D': ['D0', 'D1', 'D2', 'D7']},
                     index=[0, 1, 2, 7])

test 2

df2 = pd.DataFrame({'A': ['A1'],
                    'B': ['B1'],
                    'C': ['C1'],
                    'D': ['D1']},
                     index=[1])

Expected output test 1

Out[13]: 
    A   B   C   D
0  A0  B0  C0  D0
2  A2  B2  C2  D2
7  A7  B7  C7  D7

Expected output test 2

Empty DataFrame
Columns: [A, B, C, D]
Index: []

Just merge :

In[111]:
df1.merge(df2)

Out[111]: 
    A   B   C   D
0  A0  B0  C0  D0

The default params for merge is to merge all columns, performing an inner merge so only where all values agree

Looking at the index matching requirement, I'd filter the df prior to the merge:

In[131]:
filtered = df1.loc[df2.index].dropna()
filtered

Out[131]: 
    A   B   C   D
1  A1  B1  C1  D1

and then merge

In[132]:
filtered.merge(df2)
Out[132]: 
    A   B   C   D
0  A0  B0  C0  D0

if the indices do not match at all, say the first row of df2 is 1 instead of 2 :

In[133]:
filtered = df1.loc[df2.index].dropna()
filtered
Out[133]: 
    A   B   C   D
1  A1  B1  C1  D1

then merge will return an empty df because the index row value doesn't agree:

In[134]:
filtered.merge(df2)

Out[132]: 
Empty DataFrame
Columns: [A, B, C, D]
Index: []

UPDATE

On your new dataset, merge will reset the index which is the default behaviour:

In[152]:
filtered.merge(df2)

Out[152]: 
    A   B   C   D
0  A0  B0  C0  D0
1  A2  B2  C2  D2
2  A7  B7  C7  D7

So to retain the indices, we can just make a boolean mask using the equality operator and call dropna so that any rows with any NaN values which will occur where the values don't agree will get dropped, this should handle all cases:

In[153]:
filtered[filtered== df2.loc[filtered.index]].dropna()

Out[153]: 
    A   B   C   D
0  A0  B0  C0  D0
2  A2  B2  C2  D2
7  A7  B7  C7  D7

If you are sure that the values are the same you can do:

df1.loc[df1.index.to_series().isin(df2.index)]

Theres no need to do a merge.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM