I need to find all-pair column-wise operation on a dataframe. I came up with a naive solution but wondering if any elegant way is available.
The following script counts the number rows having one in both columns.
input:
a b c d
0 0 0 1 0
1 1 1 0 1
2 1 1 1 0
Output:
2 2 1 1
2 2 1 1
1 1 2 0
1 1 0 1
Code:
df = DataFrame(random.randint(0,high=2, size=(3,4)), columns=['a','b', 'c', 'd'])
mycolumns = df.columns
for i in range(0, shape(df)[1]):
for j in range(0, shape(df)[1]):
print(sum(df[mycolumns[i]] & df[mycolumns[j]]))
That is basically matrix multiplication of X'
and X
where X'
is transpose of X
:
>>> xs = df.values
>>> xs.T.dot(xs)
array([[2, 2, 1, 1],
[2, 2, 1, 1],
[1, 1, 2, 0],
[1, 1, 0, 1]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.