[英]Calculate difference between columns with same underlying name
I have a df which compares the new and old data.我有一个 df 比较新旧数据。 Is there a way to calculate the difference between the old and new data?
有没有办法计算新旧数据之间的差异? For generality, I don't want to sort the dataframe, but only compare root variables that have a prefix "_old" and "_new"
一般而言,我不想对数据框进行排序,而只想比较具有前缀“_old”和“_new”的根变量
df
apple_old daily banana_new banana_tree banana_old apple_new
0 5 3 4 2 10 6
for x in df.columns:
if x.endswith("_old") and x.endswith("_new"):
x = x.dif()
Expected Output;预期产出; brackets are shown just for clarity
显示括号只是为了清楚起见
df_diff
apple_diff(old-new) banana_diff(old-new)
0 -1 (5-6) 6 (10-4)
Let's try creating a Multi-Index, then subtracting old
from new
.让我们尝试创建一个多索引,然后从
new
减去old
。
Setup:设置:
import pandas as pd
df = pd.DataFrame({'apple_old': {0: 5}, 'daily': {0: 3}, 'banana_new': {0: 4},
'banana_tree': {0: 2}, 'banana_old': {0: 10},
'apple_new': {0: 6}})
# Creation of Multi-Index:
df.columns = df.columns.str.rsplit('_', n=1, expand=True).swaplevel(0, 1)
# Subtract old from new:
output_df = (df['old'] - df['new']).add_suffix('_diff')
# Display:
print(output_df)
apple_diff banana_diff
0 -1 6
Multi-Index with str.rsplit
and max split length n=1
so multiple _
are handled safely:具有
str.rsplit
和最大分割长度n=1
多索引,因此多个_
被安全处理:
df.columns = df.columns.str.rsplit('_', n=1, expand=True).swaplevel(0, 1)
old NaN new tree old new
apple daily banana banana banana apple
0 5 3 4 2 10 6
Then selection:然后选择:
df['old']
apple banana
0 5 10
df['new']
banana apple
0 4 6
Subtraction will align by columns.减法将按列对齐。 Then
add_suffix
to add the _diff
to columns.然后
add_suffix
到添加_diff
到列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.