I would like to use statsmodels linear regression model, but I have a problem: I get the nex error:
Traceback (most recent call last):
File "C:\Users\aleks\PycharmProjects\statistics\econometrics.py", line 95, in <module>
lr = sm.OLS.from_formula('rj13.2 ~ age+C(rh5)+C(r_diplom)+C(status)+C(rh6)+C(rj1.1.1)',df_stats_models).fit()
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\statsmodels\base\model.py", line 200, in from_formula
tmp = handle_formula_data(data, None, formula, depth=eval_env,
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\statsmodels\formula\formulatools.py", line 63, in handle_formula_data
result = dmatrices(formula, Y, depth, return_type='dataframe',
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\highlevel.py", line 309, in dmatrices
(lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\highlevel.py", line 164, in _do_highlevel_design
design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\highlevel.py", line 66, in _try_incr_builders
return design_matrix_builders([formula_like.lhs_termlist,
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\build.py", line 689, in design_matrix_builders
factor_states = _factors_memorize(all_factors, data_iter_maker, eval_env)
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\build.py", line 354, in _factors_memorize
which_pass = factor.memorize_passes_needed(state, eval_env)
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\eval.py", line 474, in memorize_passes_needed
subset_names = [name for name in ast_names(self.code)
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\eval.py", line 474, in <listcomp>
subset_names = [name for name in ast_names(self.code)
File "C:\Users\aleks\PycharmProjects\statistics\venv\lib\site-packages\patsy\eval.py", line 105, in ast_names
for node in ast.walk(ast.parse(code)):
File "C:\Users\aleks\AppData\Local\Programs\Python\Python39\lib\ast.py", line 50, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
C(rj1 .1 .1)
^
SyntaxError: invalid syntax
My code:
lr = sm.OLS.from_formula('rj13.2 ~ age+C(rh5)+C(r_diplom)+C(status)+C(rh6)+C(rj1.1.1)',df_stats_models).fit()
print(lr.summary())
df_stats_models.head() looks like that:
Index(['rj13.2', 'rh6', 'rh5', 'r_diplom', 'status', 'rj1.1.1', 'age'], dtype='object')
rj13.2 rh6 rh5 ... status rj1.1.1 age
46 30000.0 1986.0 МУЖСКОЙ ... областной центр ПОЛНОСТЬЮ УДОВЛЕТВОРЕНЫ 27.0
178 22000.0 1992.0 МУЖСКОЙ ... город СКОРЕЕ УДОВЛЕТВОРЕНЫ 21.0
271 10200.0 1964.0 ЖЕНСКИЙ ... город СКОРЕЕ УДОВЛЕТВОРЕНЫ 49.0
537 6000.0 1952.0 ЖЕНСКИЙ ... город СКОРЕЕ УДОВЛЕТВОРЕНЫ 61.0
538 13000.0 1964.0 ЖЕНСКИЙ ... город СКОРЕЕ УДОВЛЕТВОРЕНЫ 49.0
Why does it get angry at C(rj1.1.1)?
To read R-style formulas, statsmodels use the patsy package whose parser does not like special characters (like. or -) in the variable names. To "protect" such names, you can use the Q() function (with double quotes for the formula):
lr = sm.OLS.from_formula("Q('rj13.2') ~ age+C(rh5)+C(r_diplom)+C(status)+C(rh6)+C(Q('rj1.1.1'))", df_stats_models).fit()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.