简体   繁体   中英

Multiply column in dataframe with one row in another dataframe

I'm having problems with multiplying values in two different dataframes. Im doing a PCA regression and want to multiply all my loadings with the original values.

for example:

PCA dataframe

PC1 PC2
X 0 1
X1 1 2
X2 2 1
X3 2 1
X4 3 2
X5 5 4

Original dataframe:

A A1 A2 A3 A4 A5
1 1 3 4 1 2 4
2 8 5 3 2 1 2
3 9 3 5 1 3 1

I then want to multiply PC1 with every row in the original dataframe such that:

PC1 = 0xA + 1xA1 + 2xA2 + 2xA3 + 3xA4 + 5xA5

insert first row from second dataframe: PC1 = 0x1 + 3x1 + 4x2 + 2x1 + 3x2 + 5x8 = 59 Second row: PC1 = 0x8 + 5x1 +3x2 + 2x2 + 1x3 + 5x2 = 28 Third row: PC1 = 0x9 + 1x3 + 2x5 + 2x1 + 3x3 + 1x5 = 29

new dataframe:

PC1 PC2
1 59
2 28
3 29

And so on.

My PCA dataframe have the shape (14,4) and my value dataframe has the shape (159,14)

If same length of first DataFrame and same length of columns names in second DataFrame is possible multiple by numpy array with DataFrame.dot with rename columns names by df1.columns :

df = df2.dot(df1.to_numpy()).rename(columns=dict(enumerate(df1.columns)))
print (df)
   PC1  PC2
1   39   32
2   28   33
3   29   31

You are looking for a dot product - which you can get with np.dot

print(df)
    2  3
1       
X   0  1
X1  1  2
X2  2  1
X3  2  1
X4  3  2
X5  5  4
print(xf)
   2  3  4  5  6  7
1                  
1  1  3  4  1  2  4
2  8  5  3  2  1  2
3  9  3  5  1  3  1
print(pd.DataFrame(np.dot(xf, df), columns=['PC1', 'PC2']))
   PC1  PC2
0   39   32
1   28   33
2   29   31

Use:

string = """    PC1 PC2
X   0   1
X1  1   2
X2  2   1
X3  2   1
X4  3   2
X5  5   4"""
string2 = """A  A1  A2  A3  A4  A5
1   3   4   1   2   4
8   5   3   2   1   2
9   3   5   1   3   1"""
data1 = [x.split('  ') for x in string.split('\n')]
data2 = [x.split('  ') for x in string2.split('\n')]

df1 = pd.DataFrame(np.array([x[1:] for x in data1[1:]], dtype = float), columns = np.array(data1)[0,1:])
df2 = pd.DataFrame(np.array(data2[1:], dtype = float), columns = data2[0])





#Solution
import numpy as np
pd.DataFrame(np.dot(df2,df1), columns = ['PC1', 'PC2'])

Output:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM