简体   繁体   中英

How to show differences from two pandas dataframes of different sizes

If I have two dataframes that look like:


Product Revenue Expense Profit PaymentFrequency Customer
A 100 100 0 Monthly Cust1
B 200 150 50 Monthly Cust2
C 90 80 10 Monthly Cust3


Product Revenue Expense Profit PaymentFrequency Customer
A 120 120 0 Monthly Cust1
B 250 175 75 Monthly Cust1

For each product I would like to have a table of just the differences:

Product A

month Revenue Expense
current_month 100 100
previous_month 120 120

Product B

month Revenue Expense Profit Customer
current_month 200 150 50 Cust2
previous_month 250 175 75 Cust1

Product C

month Revenue Expense Profit PaymentFrequency Customer
current_month 90 80 10 Monthly Cust3
previous_month NaN NaN NaN NaN NaN

I've been able to identify the differences using a for loop and.loc. However, I am struggling to get the desired output.

for product in list(current_month.index):
    for field in list(current_month.columns):
            if current_month[field].loc[product] != previous_month[field].loc[product]:
                print(f'field: {field}')
                print(f'product: {product}')
                print(f'new value: {current_month[field].loc[product]}')
                print(f'old value: {previous_month[field].loc[product]}') 
        except KeyError:
            print(f'field: {field}')
            print(f'product: {product}')
            print(f'new value: {current_month[field].loc[product]}')

(i) First merge the dataframes and stack them; this will create a MultiIndex pd.Series object df_m .

(ii) Rename the MultiIndex, sort by it and unstack .

(iii) Filter for products (which is the first level of the MultiIndex), transpose the dataframe and use drop_duplicates if a value is repeated across two months.

df_m = df1.merge(df2, on='Product', how='outer', suffixes=(' current', ' previous')).set_index('Product').stack()
df_m.index = pd.MultiIndex.from_tuples([(i,)+tuple(j.split()) for i,j in df_m.index])
df_m = df_m.sort_index().unstack()

out = [(df_m[df_m.index.get_level_values(0) == product]
        .apply(lambda x: x.drop_duplicates(keep=False), axis=0)
       for product in ['A','B','C']]
productA, productB, productC = out


         Expense Revenue
current      100     100
previous   120.0   120.0

         Customer Expense Profit Revenue
current     Cust2     150     50     200
previous    Cust1   175.0   75.0   250.0

         Customer Expense PaymentFrequency Profit Revenue
current     Cust3    80.0          Monthly   10.0    90.0
previous      NaN     NaN              NaN    NaN     NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM