简体   繁体   中英

regression coefficient calculation in python

I have a Dataframe and an input text file of activity.Dataframe is produced via pandas.I want to find out the regression coefficient of each term using following formula Y=C1aX1a+C1bX1b+...+C2aX2a+C2bX2b+....C0 ,

where Y is the activity Cna the regression coefficient for the residue choice a at position n, X the dummy variable coding (xna= 1 or 0) corresponding to the presence or absence of residue choice a at position n, and C0 the mean value of the activity. My dataframe look likes

2u    2s    4r     4n     4m   7h   7v
0     1     1      0      0     0    1
0     1     0      1      0     0    1
1     0     0      1      0     1    0
1     0     0      0      1     1    0
1     0     1      0      0     1    0

Here 1 and 0 represents the presence and absence of residues respectively. Using MLR(multiple linear regression) how can i find out the regression coefficient of each residue ie, 2u,2s,4r,4n,4m,7h,7v. C1a represents the regression coefficient of residue a at 1st position(here 1a is 2u,1b is 2s, 2a is 4r...) X1a represents the dummy value ie 0 or 1 corresponding to 1a. Activity file contain following data

6.5
5.9
5.7
6.4
5.2

So first equation will look like

6.5=C1a*0+C1b*1+C2a*1+C2b*0+C2c*0+C3a*0+C3b*1+C0 …

Can I get regression coefficient using numpy?.Please help me, All suggestions will be appreciated.

Let A be your dataframe (you can get it as a pure and simple numpy array. Read it in using np.loadtxt if it's CSV), and y be your activity file (again, a numpy array), and use np.linalg.lstsq

DF = """0     1     1      0      0     0    1
0     1     0      1      0     0    1
1     0     0      1      0     1    0
1     0     0      0      1     1    0
1     0     1      0      0     1    0"""

res = """6.5,  5.9,  5.7,  6.4,  5.2"""

A = np.fromstring ( DF, sep=" " ).reshape((5,7))
y = np.fromstring(res, sep=" ")

(x, res, rango, svals ) = np.linalg.lstsq(A, y )

print x
# 2.115625,  2.490625,  1.24375 ,  1.19375 ,  2.16875 ,  2.115625, 2.490625
print np.sum(A.dot(x)**2) # Sum of squared residuals:
# 177.24750000000003
print A.dot(x) # Print predicition
# 6.225,  6.175,  5.425,  6.4  ,  5.475

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM