[英]Multiply two pandas dataframes based on their indices and columns
我有两个具有以下结构的数据框:
id 0 1 2
time 0 1 0 1 0 1
id time id 0 1 2
0 0 a1 a2 b1 b2 c1 c2 id
1 a3 a4 b3 b4 c3 c4 0 w00 w01 w02
1 0 d1 d2 d1 d2 e1 e2 and 1 w10 w11 w12
1 d3 d4 d3 d4 e3 e4 2 w20 w21 w22
2 0 f1 f2 g1 g2 h1 h2
1 f3 f4 g3 g4 h3 h4
我需要获得一个矩阵序列,这样第一个 DataFrame 的每个元素(由它们的 id 索引)必须乘以第二个 DataFrame 的相应元素(由相同的 id 索引),即:
id 0 | id 1 | id 2
time 0 1 | time 0 1 | time 0 1
id time | id time | id time
0 0 a1*w00 a2*w00 | 0 0 b1*w01 b2*w01 | 0 0 c1*w02 c2*w02
1 a3*w00 a4*w00 | 1 b3*w01 b4*w01 | 1 c3*w02 c4*w02
等等。 我当前的实现如下所示,但是样本大小仅为 200 和 3 个时间段(我需要重复数百次)需要很长时间,所以我想知道是否有办法矢量化/优化这个。 我不知道这是否重要,但最终目标是将获得的每个矩阵的所有元素相加。
import numpy as np
import pandas as pd
N = 3
T = 2
NT = N*T
# JUST GENERATING FAKE DATA
ind = []
for i in range(N):
for t in range(T):
i_t = (i,t)
ind.append(i_t)
index2 = pd.MultiIndex.from_tuples(ind)
eps1 = np.random.randint(1,10,(NT,1))
eps2 = np.random.randint(1,10,(NT,1))
df1 = pd.DataFrame(eps1.dot(eps2.transpose()), index=index2, columns=index2)
w = np.random.normal(0, 1, size=(N,1))
df2 = pd.DataFrame(w.dot(w.transpose()))
E = pd.DataFrame(index=range(N), columns=range(N))
# THIS IS WHAT I NEED TO VECTORIZE/OPTIMIZE
for i in range(N):
for j in range(N):
E.loc[i][j] = (df1.loc[i][j] * df2.loc[i][j]).to_numpy().sum()
E
尝试:
(df1.mul(df2, level=0) # multiply two df, align by level 0
.sum(level=0) # sum along columns, align by level 0
.sum(axis=1, level=0) # sum along rows, aling by level 0
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.