If I have two dataframes that look like:
current_month
Product | Revenue | Expense | Profit | PaymentFrequency | Customer |
---|---|---|---|---|---|
A | 100 | 100 | 0 | Monthly | Cust1 |
B | 200 | 150 | 50 | Monthly | Cust2 |
C | 90 | 80 | 10 | Monthly | Cust3 |
previous_month
Product | Revenue | Expense | Profit | PaymentFrequency | Customer |
---|---|---|---|---|---|
A | 120 | 120 | 0 | Monthly | Cust1 |
B | 250 | 175 | 75 | Monthly | Cust1 |
For each product I would like to have a table of just the differences:
Product A
month | Revenue | Expense |
---|---|---|
current_month | 100 | 100 |
previous_month | 120 | 120 |
Product B
month | Revenue | Expense | Profit | Customer |
---|---|---|---|---|
current_month | 200 | 150 | 50 | Cust2 |
previous_month | 250 | 175 | 75 | Cust1 |
Product C
month | Revenue | Expense | Profit | PaymentFrequency | Customer |
---|---|---|---|---|---|
current_month | 90 | 80 | 10 | Monthly | Cust3 |
previous_month | NaN | NaN | NaN | NaN | NaN |
I've been able to identify the differences using a for loop and.loc. However, I am struggling to get the desired output.
for product in list(current_month.index):
for field in list(current_month.columns):
try:
if current_month[field].loc[product] != previous_month[field].loc[product]:
print(f'field: {field}')
print(f'product: {product}')
print(f'new value: {current_month[field].loc[product]}')
print(f'old value: {previous_month[field].loc[product]}')
except KeyError:
print(f'field: {field}')
print(f'product: {product}')
print(f'new value: {current_month[field].loc[product]}')
print(f'NaN')
(i) First merge
the dataframes and stack
them; this will create a MultiIndex pd.Series object df_m
.
(ii) Rename the MultiIndex, sort by it and unstack
.
(iii) Filter for products (which is the first level of the MultiIndex), transpose the dataframe and use drop_duplicates
if a value is repeated across two months.
df_m = df1.merge(df2, on='Product', how='outer', suffixes=(' current', ' previous')).set_index('Product').stack()
df_m.index = pd.MultiIndex.from_tuples([(i,)+tuple(j.split()) for i,j in df_m.index])
df_m = df_m.sort_index().unstack()
out = [(df_m[df_m.index.get_level_values(0) == product]
.T
.replace(np.nan,'NaN')
.apply(lambda x: x.drop_duplicates(keep=False), axis=0)
.dropna(axis=1)
.replace('NaN',np.nan))
for product in ['A','B','C']]
productA, productB, productC = out
Output:
A
Expense Revenue
current 100 100
previous 120.0 120.0
B
Customer Expense Profit Revenue
current Cust2 150 50 200
previous Cust1 175.0 75.0 250.0
C
Customer Expense PaymentFrequency Profit Revenue
current Cust3 80.0 Monthly 10.0 90.0
previous NaN NaN NaN NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.