简体   繁体   中英

How to check if two columns are logically the same in pandas?

Say I have a dataframe as below:

column_1 column_2
1 car
2 truck
1 car
3 plane
3 plane
2 truck

You can clearly see that the column_1 is logically describing the same thing as the column_2 . But my dataset is huge and I can't use a visual inspection to understand this relationship between these 2 columns. How can I check if two columns (as shown in the example) are actually logically the same?

Use factorize and compare both output arrays by all for test if all values are True s:

print (pd.factorize(df['column_1'])[0] == pd.factorize(df['column_2'])[0]).all()
True

Another idea with mapping:

d = df.set_index('column_1')['column_2'].to_dict()
print (df['column_1'].map(d).eq(df['column_2']).all())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM