简体   繁体   中英

Is there a Python function to calculate the diff based on the first element?

I have a pandas dataframe, like this example:

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78]},
    index=[0,1,2,3,4])

I would like to calculate the percentage of abandonments in each step, based on the first value.

Output:

Steps   qty Tx %
Step1   100 0,00%
Step2   95  5,00%
Step3   92  3,00%
Step4   87  5,00%
Step5   78  9,00%

I thought about using pd.pct_change() , but it doesn't work as expected.

The manual calculation would be something like:


values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

However, I believe that this is not scalable, especially considering increasing the number of steps or time periods.

Could someone help me think of some function or show a rationale that can make this calculation simpler?

You can use pd.Series.shift to shift the qty column one element down. Then simply calculate the difference between the shifted column and itself:

import pandas as pd

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78],
})

df['Tx %'] = df.qty.shift() - df.qty
print(df)

# output:
#    steps  qty  Tx %
# 0  step1  100   NaN
# 1  step2   95   5.0
# 2  step3   92   3.0
# 3  step4   87   5.0
# 4  step5   78   9.0

This works:

# you have
values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

# 1s cancel out, so the above is equivalent to
values = [( - df['qty'][0]/df['qty'][0]) + (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][1]/df['qty'][0]) + (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][2]/df['qty'][0]) + (df['qty'][1]/df['qty'][0]),
          ( - df['qty'][3]/df['qty'][0]) + (df['qty'][2]/df['qty'][0]),
          ( - df['qty'][4]/df['qty'][0]) + (df['qty'][3]/df['qty'][0])]

# since every element is divided by df['qty'][0], you can take it out, so the above is equivalent to
values = [( - df['qty'][0] + df['qty'][0]) / df['qty'][0],
          ( - df['qty'][1] + df['qty'][0]) / df['qty'][0],
          ( - df['qty'][2] + df['qty'][1]) / df['qty'][0],
          ( - df['qty'][3] + df['qty'][2]) / df['qty'][0],
          ( - df['qty'][4] + df['qty'][3]) / df['qty'][0]]

#the RHS is the LHS shifted one level down, so it's equivalent to:
values = (- df['qty'] + df['qty'].shift()).fillna(0) / df['qty'][0]

# in summary
df['Tx %'] = (df['qty'].shift() - df['qty']).fillna(0) / df['qty'][0] * 100
   steps  qty  Tx %
0  step1  100   0.0
1  step2   95   5.0
2  step3   92   3.0
3  step4   87   5.0
4  step5   78   9.0

This response my doubt:

df['Tx %'] = round(((1 - df.qty / df.qty[0]) - (1- df.qty.shift() / df.qty[0]))*100, 2)

Thanks to @jfaccioni

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM