简体   繁体   中英

Cross tabulation between two sets of columns in a DataFrame

I have tried to get crosstab of data specified by slice.

But something wrong in syntax.

data.csv like the following

ia,ib,ic,id,ie,if,ig  
a,0,0,0,e,0,g         
0,b,0,0,e,f,0         
0,0,c,d,0,f,g         

And then do python3 test.py like the following

import pandas as pd
import enum

df = pd.read_csv('data.csv')

class Slices(enum.Enum):
    first = slice(0, 2)
    second = slice(4, 6)

def getCrosstab(*args):
    cols1 = []
    cols1.append(df.iloc[:, args[0].value])
    cols2 = []
    cols2.append(df.iloc[:, args[1].value])
    print( pd.crosstab(cols1, cols2) )

if __name__ == '__main__':
    getCrosstab(Slices.first, Slices.second)

Expected result:

col2 ie if ig
col1
ia   1  0  1
ib   1  1  0
ic   0  1  1

But I had an error:
ValueError: Shape of passed values is (2, 2), indices imply (2, 3)

I can not fully understand the meaning of this error.

Please give me your guidance.

melt twice, once for each set of columns, and then call crosstab :

u = (df.melt(['ia', 'ib', 'ic'], var_name='C', value_name='D')
       .melt(['C', 'D'], var_name='A', value_name='B')
       .query("B != '0' and D != '0'"))

pd.crosstab(u.A, u.C)

C   id  ie  if  ig
A                 
ia   0   1   0   1
ib   0   1   1   0
ic   1   0   1   1

def crosstab_for(df, sliceA, sliceB):
    u = (df.reindex(df.columns[sliceA] | df.columns[sliceB], axis=1)
           .melt(df.columns[sliceA], var_name='C', value_name='D')
           .melt(['C', 'D'], var_name='A', value_name='B')
           .query("B != '0' and D != '0'"))

    return pd.crosstab(u.A, u.C)

crosstab_for(df, slice(0, 3), slice(4, 7))

C   ie  if  ig
A             
ia   1   0   1
ib   1   1   0
ic   0   1   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM