简体   繁体   中英

Elegant way to perform column-wise operations on two dataframe

I need to find all-pair column-wise operation on a dataframe. I came up with a naive solution but wondering if any elegant way is available.

The following script counts the number rows having one in both columns.

input:

   a  b  c  d
0  0  0  1  0
1  1  1  0  1
2  1  1  1  0

Output:

2   2   1   1
2   2   1   1
1   1   2   0
1   1   0   1

Code:

df = DataFrame(random.randint(0,high=2, size=(3,4)),  columns=['a','b', 'c', 'd'])
mycolumns = df.columns
for i in range(0, shape(df)[1]):
    for j in range(0, shape(df)[1]):
        print(sum(df[mycolumns[i]] & df[mycolumns[j]]))

That is basically matrix multiplication of X' and X where X' is transpose of X :

>>> xs = df.values
>>> xs.T.dot(xs)
array([[2, 2, 1, 1],
       [2, 2, 1, 1],
       [1, 1, 2, 0],
       [1, 1, 0, 1]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM