简体   繁体   中英

Multivariate regression with numpy in Python

I'm learning about machine learning with Python and have a question about regression. I made some simple regression (linear or polynomial) but my question is about Multivariate regression. I only worked with x (the input array) and y is the output.

If I have some data about the forest fires ( http://archive.ics.uci.edu/ml/datasets/Forest+Fires )

X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0,2
7,4,oct,tue,90.6,35.4,669.1,6.7,18,33,0.9,0,12

In this case, my input is not a simple array but is a matrix, and my output is the area burned.

So for my previous data, the input X is

X = [[7, 5, mar, fri, 86.2, 26.2, 94.3, 5.1, 8.2, 51, 6.7 , 0], 
[7, 4, oct, tue, 90.6, 35.4, 669.1, 6.7, 18, 33, 0.9, 0]] 

the output

Y = [2,12]

How to make this kind of regression ? I don't want a code but just some ideas about Multivariate regression. I'm using numpy but maybe that some libraries are more effective for this problem.

Linear regression usually makes no sense for categorical variables such as days or months. What you want to do is transform the variable month into 12 binary variables (look up "dummy variable") january, february, and so forth and omit any one of those for the model to be identified. The coefficients for these variables then give you the difference in the conditional mean relative to the one you omit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM