简体   繁体   中英

How to compare two dataframes and find matches from columns (pandas)

let's say we have the following code example where we create two basic dataframes:

import pandas as pd
 
# Creating Dataframes
a = [{'Name': 'abc', 'Age': 8, 'Grade': 3},
     {'Name': 'xyz', 'Age': 9, 'Grade': 3}]
 
df1 = pd.DataFrame(a)
b = [{'ID': 1,'Name': 'abc', 'Age': 8},
     {'ID': 2,'Name': 'xyz', 'Age': 9}]
 
df2 = pd.DataFrame(b)
 
# Printing Dataframes
display(df1)
display(df2)

We get the following datasets:

    Name   Age  Grade
0   abc    8    3
1   xyz    9    3


    ID   Name   Age
0   1    abc    8
1   2    xyz    9

How can I find the list of columns that are not repeated in these frames when they are intersected? That is, as a result, I want to get the names of the following columns: ['Grade', 'ID']

Use symmetric_difference

res = df2.columns.symmetric_difference(df1.columns)
print(res)

Output

Index(['Grade', 'ID'], dtype='object')

Or as an alternative, use set.symmetric_difference

res = set(df2.columns).symmetric_difference(df1.columns)
print(res)

Output

{'Grade', 'ID'}

A third alternative, suggested by @SashSinha, is to use the shortcut:

res = df2.columns ^ df1.columns

but as of pandas 1.4.3 this issue a warning:

FutureWarning: Index. xor operating as a set operation is deprecated, in the future this will be a logical operation matching Series. xor . Use index.symmetric_difference(other) instead. res = df2.columns ^ df1.columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM