Pivot Table of countifs() on Pandas

Question

I have a dataset that's an identifier ID and some flags for characteristics in that data, for example:

In [86]: frame = pd.DataFrame({"key": [1,2,3,4,5,6,7,8,9], "flag1": [0,1,0,1,0,1,0,1,1], "flag2": [0,0,1,1,0,0,1,1,0], "flag3": [0,0,0,0,1,1,1,1,1]}, columns=['key','flag1','flag2','flag3'])

In [87]: frame
Out[87]:
   key  flag1  flag2  flag3
0    1      0      0      0
1    2      1      0      0
2    3      0      1      0
3    4      1      1      0
4    5      0      0      1
5    6      1      0      1
6    7      0      1      1
7    8      1      1      1
8    9      1      0      1

I'm looking to output a dataset that provides me counts of whenever both of the flags are met as a pivot table, for example:

   flags  flag1  flag2  flag3
0  flag1      5      2      3
1  flag2      2      4      2
2  flag3      3      2      5

I think I'll have to iterate over frame.keys()[1:] on two loops, but I don't know how to populate this second dataset. I'm should imitate behavior from this Google Sheet, but my actual dataset is too large for Sheets/Excel to be useable (about 2 million rows and 60 columns): https://docs.google.com/spreadsheets/d/1emEm9RtxPAFceUgalCVbzr0mGNoZEMFjWwqSjrxyAuE/edit?usp=sharing

Answer 1

Let's remove key , we don't need it. After that, the solution is pretty much a matrix dot product:

v = frame.drop('key', 1)
v.T.dot(v)

       flag1  flag2  flag3
flag1      5      2      3
flag2      2      4      2
flag3      3      2      5

Or, more efficiently, using del to drop the key column:

del frame['key']
frame.T.dot(frame)

       flag1  flag2  flag3
flag1      5      2      3
flag2      2      4      2
flag3      3      2      5

Pivot Table of countifs() on Pandas

Question

1 answers

solution1
3 ACCPTED 2018-02-07 11:08:49

Pivot Table of countifs() on Pandas

Question

1 answers

solution1 3 ACCPTED 2018-02-07 11:08:49

solution1
3 ACCPTED 2018-02-07 11:08:49