[英]How to rearrange rows in dataframe and obtain a new columns having percentage difference of 2 other columns in pandas?
我有一個數據框,如下所示:
Case Peak 'A' Peak 'B' Volume 'C' Volume 'D'
1 5.00 4.00 0.34 0.32
2 5.70 6.00 0.14 0.15
3 11.00 20.00 0.42 0.50
預期輸出如下所示:
其中:
需要添加“差異峰”列,它是百分比差異,即([[BA)/ B] * 100)
'Diff Vol'列將添加([(DC)/ D] * 100),這是百分比差異。
要為“峰”添加“在范圍內”,如果“差異峰”在-15%到25%的范圍內,則必須將該列填充為是。 如果不是,則如圖所示。
類似地,如果“差異體積”在“ -10%到20%”的范圍內,則在“體積”中填充“在范圍內”列。
我該怎么辦?
只需創建新列:
import numpy as np
df['Diff Peak'] = (df.B - df.A) / df.B * 100
df['Diff Vol'] = (df.D - df.C) / df.D * 100
df['Within Range Peak'] = np.logical_and(df['Diff Peak'] >= -15.0, df['Diff Peak'] <= 25.0)
df['Within Range Vol'] = np.logical_and(df['Diff Vol'] >= -10.0, df['Diff Vol'] <= 20.0)
如果不需要在列中使用Multiindex
,則可以使用:
#use formulas
df['Diff Peak'] = df["Peak 'B'"].sub(df["Peak 'A'"]).div(df["Peak 'B'"]).mul(100)
df['Diff Vol'] = df["Volume 'D'"].sub(df["Volume 'C'"]).div(df["Volume 'D'"]).mul(100)
#check range, then add Yes or No
df['Peak Within Range'] = np.where(df['Diff Peak'].between(-15, 25), 'Yes', 'No')
df['Volumn Within Range'] = np.where(df['Diff Vol'].between(-10, 20), 'Yes', 'No')
#convert to string, rounding (if necessary), add %
df['Diff Peak'] = df['Diff Peak'].round(2).astype(str) + '%'
df['Diff Vol'] = df['Diff Vol'].round(2).astype(str) + '%'
print (df)
Case Peak 'A' Peak 'B' Volume 'C' Volume 'D' Diff Peak Diff Vol \
0 1 5.0 4.0 0.34 0.32 -25.0% -6.25%
1 2 5.7 6.0 0.14 0.15 5.0% 6.67%
2 3 11.0 20.0 0.42 0.50 45.0% 16.0%
Peak Within Range Volumn Within Range
0 No Yes
1 Yes Yes
2 No Yes
但是如果需要在列中使用Multiindex
:
df = df.set_index('Case')
df['Peak Diff peak'] = df["Peak 'B'"].sub(df["Peak 'A'"]).div(df["Peak 'B'"]).mul(100)
df['Volume Diff Vol'] = df["Volume 'D'"].sub(df["Volume 'C'"]).div(df["Volume 'D'"]).mul(100)
df['Peak Within Range'] = np.where(df['Peak Diff peak'].between(-15, 25), 'Yes', 'No')
df['Volume Within Range'] = np.where(df['Volume Diff Vol'].between(-10, 20), 'Yes', 'No')
df['Peak Diff peak'] = df['Peak Diff peak'].round(2).astype(str) + '%'
df['Volume Diff Vol'] = df['Volume Diff Vol'].round(2).astype(str) + '%'
#filter columns start with Peak
df1 = df.filter(regex='^Peak')
#rename parts of columns
df1.columns = df1.columns.str.replace('Peak', 'Peak (+25% to -15%)_')
#create MultiIndex
df1.columns = df1.columns.str.split('_ ', expand=True)
print (df1)
Peak (+25% to -15%)
'A' 'B' Diff peak Within Range
Case
1 5.0 4.0 -25.0% No
2 5.7 6.0 5.0% Yes
3 11.0 20.0 45.0% No
#same as df1, only Volume
df2 = df.filter(regex='^Volume')
df2.columns = df2.columns.str.replace('Volume', 'Volume (+20% to -10%)_')
df2.columns = df2.columns.str.split('_ ', expand=True)
print (df2)
Volume (+20% to -10%)
'C' 'D' Diff Vol Within Range
Case
1 0.34 0.32 -6.25% Yes
2 0.14 0.15 6.67% Yes
3 0.42 0.50 16.0% Yes
#concat both dataframes to one
df3 = pd.concat([df1, df2], axis=1).reset_index()
print (df3)
Case Peak (+25% to -15%) Volume (+20% to -10%) \
'A' 'B' Diff peak Within Range 'C'
0 1 5.0 4.0 -25.0% No 0.34
1 2 5.7 6.0 5.0% Yes 0.14
2 3 11.0 20.0 45.0% No 0.42
'D' Diff Vol Within Range
0 0.32 -6.25% Yes
1 0.15 6.67% Yes
2 0.50 16.0% Yes
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.