简体   繁体   中英

Compare if two python tables are tibble equivalent

I want to write a function to compare if 2 tables are tibble equivalent(identical variables and observations) For example, the 2 tables below are equivalent. For the third one isn't.

a   b   c
x   1   hat
y   2   cat
z   3   bat
w   4   rat

b   c   a
2   cat y
3   bat z
1   hat x
4   rat w

a   b   c
2   y   cat
3   z   bat
1   x   hat
4   w   rat

I decided to solve this by comparing the max values. How do I properly call out first, second, etc column and compare the max values for each one?

def equal(A, B):
    A_names = sorted(A.columns)
    X = A[var_names].copy()
    B_names=sorted(B.columns)
    Y=B[var_names].copy()

    if A[0].max()==B[0].max() and A[1].max()==B[1].max():
        return True
    else:
        return False

This has a Error KeyError: 0

This task can be solved by using equals method of DataFrame object and some DataFrames preprocessing:

def compare_dataframes(df1, df2):
    df1_cols = df1.columns.tolist()
    df2_cols = df2.columns.tolist()
    # column names and shapes should be equal for both dataframes 
    if set(df1_cols).symmetric_difference(set(df2_cols)) or (df1.shape != df2.shape):
        return False
    df1_sorted = df1.sort_values(by=cols).reset_index(drop=True)
    df2_sorted = df2.sort_values(by=cols).reset_index(drop=True)
    df2_sorted = df2_sorted[df1_sorted.columns]
    return df1_sorted.equals(df2_sorted)
A_var_names = sorted(A.columns)
AA = A[A_var_names].copy() #COLUMN ORDER
AA.sort_values(by=A_var_names,inplace=True) #VALUE ORDER


B_var_names = sorted(B.columns)
BB = B[B_var_names].copy()
BB.sort_values(by=B_var_names,inplace=True)


return AA.equals(BB)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM