简体   繁体   中英

How do I process 2 columns in Pandas and create a new dataframe with new column names

I want to calculate the co-variance of each of the columns with one another in my_list . The formula is in the function def covariance_formula(...):

My code is as follows:

#!/usr/bin/python3

import pandas as pd
import numpy as np

my_list = ['A', 'B', 'C', 'D', 'E']

def create_df():
    return pd.DataFrame(np.random.randint(0,100,size=(5, 5)).astype(float), columns=my_list)


def iterate_list(df):
    for i in range(len(my_list)):
        for j in range(i + 1, len(my_list)):
            column_one = my_list[i]
            column_two = my_list[j]
            col_name = column_one + " vs." + column_two

            column_1_value = df[df.columns[df.columns.str.startswith(column_one)]]
            column_2_value = df[df.columns[df.columns.str.startswith(column_two)]]
            column_1_mean = df[df.columns[df.columns.str.startswith(column_one)]].mean(axis=0)
            column_2_mean = df[df.columns[df.columns.str.startswith(column_two)]].mean(axis=0)
            df2[col_name] = covariance_formula(column_1_value, column_2_value, column_1_mean, column_2_mean)

    return df2


def covariance_formula(a, b, mean_a, mean_b):
    covar = (a - mean_a) * (b - mean_b)
    return covar


def main():
    df = create_df()
    # print(df)               ## see OUTPUT A 
    df2 = iterate_list(df)    ## <<< THIS IS WHERE I AM HAVING MY PROBLEM
    # print(df2)              ## see EXPECTED OUTPUT B
    print(df2)


if __name__ == "__main__":
    main()

Questions:

How can I create a new df df2 which will have the output of in EXPECTED OUTPUT B ? Is there a faster way of doing it?

Current Problem:

The current problem I am facing is that I cannot seem to get rid of this:

NameError: name 'df2' is not defined

Things I have tried:

OUTPUT A :

      A     B     C     D     E
0  87.0  92.0  66.0   8.0  67.0
1  84.0  18.0   9.0  80.0  41.0
2  38.0  24.0  53.0  25.0  14.0
3  87.0  25.0  19.0   5.0   0.0
4  91.0  69.0  55.0  14.0  90.0

EXPECTED OUTPUT B :

    A vs.B  A vs.C  A vs.D  A vs.E  B vs.C   B vs.D  B vs.E  C vs.D C vs.E  D vs.E
0    445.4   245.8  -176.6   236.2  1187.8   -853.8  1141.4  -471.0  629.8  -452.6
1   -182.2  -207.2   353.8    -9.2   866.6  -1479.4    38.6 -1683.0   44.0   -75.0
2    851.0  -496.4    55.2  1119.0  -272.2     30.2   613.4   -17.6 -357.8    39.8
3   -197.8  -205.4  -205.4  -407.0   440.8    440.8   873.4   458.0  907.4   907.4 
4    318.2   198.6  -168.6   647.4   341.6   -290.2  1113.8  -181.0  695.0  -590.2

You can do that more easily if you use itertools.combinations() and a dict comprehension to build your columns like:

Code:

def build_covars(covar_df):
    columns = {i + " vs." + j: covariance_formula(covar_df[i], covar_df[j])
               for i, j in it.combinations(covar_df.columns, 2)}
    return pd.concat(columns, axis=1)

Test Code:

import itertools as it
import pandas as pd

def build_covars(covar_df):
    columns = {i + " vs." + j: covariance_formula(covar_df[i], covar_df[j])
               for i, j in it.combinations(covar_df.columns, 2)}
    return pd.concat(columns, axis=1)

def covariance_formula(a, b):
    return (a - a.mean()) * (b - b.mean())

my_list = ['A', 'B', 'C', 'D', 'E']

def create_df():
    return pd.DataFrame(
        np.random.randint(0, 100, size=(5, 5)).astype(float),
        columns=my_list)

df = create_df()
print(build_covars(df))

Results:

    A vs.B  A vs.C  A vs.D   A vs.E  B vs.C  B vs.D  B vs.E  C vs.D  C vs.E  \
0    52.48   49.92  -43.52   323.84   63.96  -55.76  414.92  -53.04  394.68   
1   127.68  123.12  184.68    18.24  120.96  181.44   17.92  174.96   17.28   
2   175.48  124.12  -17.12    98.44   47.56   -6.56   37.72   -4.64   26.68   
3    10.08 -127.68  -57.12  -280.56  -18.24   -8.16  -40.08  103.36  507.68   
4  1370.88  437.92   85.68  1113.84  264.96   51.84  673.92   16.56  215.28   

   D vs.E  
0 -344.08  
1   25.92  
2   -3.68  
3  227.12  
4   42.12 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM