计算具有相同基础名称的列之间的差异

Question

I have a df which compares the new and old data.我有一个 df 比较新旧数据。 Is there a way to calculate the difference between the old and new data?有没有办法计算新旧数据之间的差异？ For generality, I don't want to sort the dataframe, but only compare root variables that have a prefix "_old" and "_new"一般而言，我不想对数据框进行排序，而只想比较具有前缀“_old”和“_new”的根变量

df
     apple_old      daily    banana_new    banana_tree   banana_old apple_new
0      5             3           4              2           10        6

for x in df.columns:
    if x.endswith("_old") and x.endswith("_new"):
        x = x.dif()

Expected Output;预期产出； brackets are shown just for clarity显示括号只是为了清楚起见

df_diff
     apple_diff(old-new)         banana_diff(old-new)       
0      -1       (5-6)                      6   (10-4)

Answer 1

Let's try creating a Multi-Index, then subtracting old from new .让我们尝试创建一个多索引，然后从new减去old 。

Setup:设置：

import pandas as pd

df = pd.DataFrame({'apple_old': {0: 5}, 'daily': {0: 3}, 'banana_new': {0: 4},
                   'banana_tree': {0: 2}, 'banana_old': {0: 10},
                   'apple_new': {0: 6}})

# Creation of Multi-Index:
df.columns = df.columns.str.rsplit('_', n=1, expand=True).swaplevel(0, 1)
# Subtract old from new:
output_df = (df['old'] - df['new']).add_suffix('_diff')
# Display:
print(output_df)

   apple_diff  banana_diff
0          -1            6

Multi-Index with str.rsplit and max split length n=1 so multiple _ are handled safely:具有str.rsplit和最大分割长度n=1多索引，因此多个_被安全处理：

df.columns = df.columns.str.rsplit('_', n=1, expand=True).swaplevel(0, 1)

    old   NaN    new   tree    old   new
  apple daily banana banana banana apple
0     5     3      4      2     10     6

Then selection:然后选择：

df['old']

   apple  banana
0      5      10

df['new']

   banana  apple
0       4      6

Subtraction will align by columns.减法将按列对齐。 Then add_suffix to add the _diff to columns.然后add_suffix到添加_diff到列。

计算具有相同基础名称的列之间的差异

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-07-13 13:39:56

计算具有相同基础名称的列之间的差异

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-07-13 13:39:56

解决方案1
2 已采纳 2021-07-13 13:39:56