简体   繁体   中英

How to calculate the difference between columns in python?

I have a pandas dataframe with the following values:

source ACCESS CREATED TERMS SIGNED BUREAU
Facebook 12 8 6
Google 160 136 121
Email 29 26 25

This is a small example of this dataframe, actually there are more rows and columns in them, but maybe for example it should help.

The outuput (in percentage):

source ACCESS CREATED TERMS SIGNED BUREAU
Facebook 0 33.33 25.00
Google 0 15.00 11.03
Email 0 10.34 3.85

How to create a new dataframe with the difference (in percentage) from one column to another, for example: COLUMN A: 12, COLUMN B: 8, so the difference in this step is 33.33%, and from COLUMN C: 6, and the difference from B to C is 25%.

UPDATE

I found this solution:

def percentage_change(col1,col2):
    return ((col2 - col1) / col1) * 100

SO, How can I iterate this for all my columns?

Here's what I would suggest.

I defined three functions:

  1. pairwise_percent_difference() computes percentage differences for each pair of columns which are next to one another. It takes a parameter, columns_excluded , which is the number of columns at the beginning to ignore. I set this to 1, since you want to ignore the source column.
  2. column_difference() is called once for every pair of columns. This step is pretty easy because we can subtract two columns, and Pandas will automatically turn that into subtracting each element of the two columns, one row at a time.
  3. pairwise() is a function which accepts an iterable (eg a list of Pandas column names) and emits an iterable with each pair of neighboring elements in a tuple together. It is copied from the itertools documentation.

Code:

import pandas as pd
import itertools
import io  # Only used for loading example dataframe

s = """source,ACCESS CREATED,TERMS SIGNED,BUREAU
Facebook,12,8,6
Google,160,136,121
Email,29,26,25"""

df = pd.read_csv(io.StringIO(s))

def pairwise(iterable):
    # pairwise('ABCDEFG') --> AB BC CD DE EF FG
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)

def column_difference(col1, col2):
    return (col1 - col2) / col1 * 100

def pairwise_percent_difference(df, columns_excluded=0):
    output_df = df.iloc[:, :columns_excluded].copy()
    cols_to_compare = df.columns[columns_excluded:]
    for col1, col2 in pairwise(cols_to_compare):
        output_df[col2] = column_difference(df[col1], df[col2])
    return output_df

print(pairwise_percent_difference(df, 1))

Output:

     source   TERMS SIGNED      BUREAU
0  Facebook       33.333333  25.000000
1    Google       15.000000  11.029412
2     Email       10.344828   3.846154

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM