How to show differences from two pandas dataframes of different sizes

Question

If I have two dataframes that look like:

current_month

Product	Revenue	Expense	Profit	PaymentFrequency	Customer
A	100	100	0	Monthly	Cust1
B	200	150	50	Monthly	Cust2
C	90	80	10	Monthly	Cust3

previous_month

Product	Revenue	Expense	Profit	PaymentFrequency	Customer
A	120	120	0	Monthly	Cust1
B	250	175	75	Monthly	Cust1

For each product I would like to have a table of just the differences:

Product A

month	Revenue	Expense
current_month	100	100
previous_month	120	120

Product B

month	Revenue	Expense	Profit	Customer
current_month	200	150	50	Cust2
previous_month	250	175	75	Cust1

Product C

month	Revenue	Expense	Profit	PaymentFrequency	Customer
current_month	90	80	10	Monthly	Cust3
previous_month	NaN	NaN	NaN	NaN	NaN

I've been able to identify the differences using a for loop and.loc. However, I am struggling to get the desired output.

for product in list(current_month.index):
    for field in list(current_month.columns):
        try:
            if current_month[field].loc[product] != previous_month[field].loc[product]:
                print(f'field: {field}')
                print(f'product: {product}')
                print(f'new value: {current_month[field].loc[product]}')
                print(f'old value: {previous_month[field].loc[product]}') 
        except KeyError:
            print(f'field: {field}')
            print(f'product: {product}')
            print(f'new value: {current_month[field].loc[product]}')
            print(f'NaN')

Answer 1

(i) First merge the dataframes and stack them; this will create a MultiIndex pd.Series object df_m .

(ii) Rename the MultiIndex, sort by it and unstack .

(iii) Filter for products (which is the first level of the MultiIndex), transpose the dataframe and use drop_duplicates if a value is repeated across two months.

df_m = df1.merge(df2, on='Product', how='outer', suffixes=(' current', ' previous')).set_index('Product').stack()
df_m.index = pd.MultiIndex.from_tuples([(i,)+tuple(j.split()) for i,j in df_m.index])
df_m = df_m.sort_index().unstack()


out = [(df_m[df_m.index.get_level_values(0) == product]
        .T
        .replace(np.nan,'NaN')
        .apply(lambda x: x.drop_duplicates(keep=False), axis=0)
        .dropna(axis=1)
        .replace('NaN',np.nan)) 
       for product in ['A','B','C']]
productA, productB, productC = out

Output:

               A        
         Expense Revenue
current      100     100
previous   120.0   120.0

                B                       
         Customer Expense Profit Revenue
current     Cust2     150     50     200
previous    Cust1   175.0   75.0   250.0

                C                                        
         Customer Expense PaymentFrequency Profit Revenue
current     Cust3    80.0          Monthly   10.0    90.0
previous      NaN     NaN              NaN    NaN     NaN

How to show differences from two pandas dataframes of different sizes

Question

1 answers

solution1
0 2022-01-07 00:45:34

How to show differences from two pandas dataframes of different sizes

Question

1 answers

solution1 0 2022-01-07 00:45:34

solution1
0 2022-01-07 00:45:34