简体   繁体   中英

Calculate difference between columns with same underlying name

I have a df which compares the new and old data. Is there a way to calculate the difference between the old and new data? For generality, I don't want to sort the dataframe, but only compare root variables that have a prefix "_old" and "_new"

df
     apple_old      daily    banana_new    banana_tree   banana_old apple_new
0      5             3           4              2           10        6

for x in df.columns:
    if x.endswith("_old") and x.endswith("_new"):
        x = x.dif()

Expected Output; brackets are shown just for clarity

df_diff
     apple_diff(old-new)         banana_diff(old-new)       
0      -1       (5-6)                      6   (10-4)              

Let's try creating a Multi-Index, then subtracting old from new .

Setup:

import pandas as pd

df = pd.DataFrame({'apple_old': {0: 5}, 'daily': {0: 3}, 'banana_new': {0: 4},
                   'banana_tree': {0: 2}, 'banana_old': {0: 10},
                   'apple_new': {0: 6}})

# Creation of Multi-Index:
df.columns = df.columns.str.rsplit('_', n=1, expand=True).swaplevel(0, 1)
# Subtract old from new:
output_df = (df['old'] - df['new']).add_suffix('_diff')
# Display:
print(output_df)
   apple_diff  banana_diff
0          -1            6

Multi-Index with str.rsplit and max split length n=1 so multiple _ are handled safely:

df.columns = df.columns.str.rsplit('_', n=1, expand=True).swaplevel(0, 1)
    old   NaN    new   tree    old   new
  apple daily banana banana banana apple
0     5     3      4      2     10     6

Then selection:

df['old']

   apple  banana
0      5      10

df['new']

   banana  apple
0       4      6

Subtraction will align by columns. Then add_suffix to add the _diff to columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM