简体   繁体   中英

Treating data as categorical in linear regression

I have data in a csv file that looks somewhat like this:

column1    column2
   b          2
   c          4
   z          1
   g          3
...

(This is not the real data) Column1 is categorical and column2 is continuous and I want to carry out linear regression on this data. My code looks like this at the moment:

# Function to get data from the csv file.
def import_data(file_name):
 df = pd.read_csv(file_name).drop_duplicates()
 X_parameter = []
 Y_parameter = []
 for alpha, beta in zip(df['column1'], df['column2']):
       X_parameter.append([float(alpha)])
       Y_parameter.append(float(beta))
 return X_parameter, Y_parameter


X, Y = import_data(filename)
def linear_model_main(X_parameters, Y_parameters, predict_value):

 # Create linear regression object

 regress = linear_model.LinearRegression()
 regress.fit(X_parameters, Y_parameters)
 prediction_outcome = regress.predict(predict_value)
 predictions = {}
 predictions['intercept'] = regress.intercept_
 predictions['coefficient'] = regress.coef_
 predictions['predicted_value'] = prediction_outcome
 return predictions

I'm not sure how to specify in this code that column1 is categorical? I tried changing it to numerical data ( a = 1, b = 2, ... ) but Python is treating it as continuous.

You can use get_dummies to return them as dummy variables

>>> pd.concat([df, pd.get_dummies(df.column1)], axis=1)
  column1  column2  b  c  g  z
0       b        2  1  0  0  0
1       c        4  0  1  0  0
2       z        1  0  0  0  1
3       g        3  0  0  1  0

EDIT:

del df['column1']
df = df[['b', 'c', 'g', 'z', 'column2']]
>>> df
   b  c  g  z  column2
0  1  0  0  0        2
1  0  1  0  0        4
2  0  0  0  1        1
3  0  0  1  0        3

regress.fit(df.iloc[:, :-1].values, df.iloc[:, -1].values)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM