简体   繁体   English

当我试图找出 R 的完美匹配时,我如何 select 中的特定系数 R

[英]How do I select specific coeficients in R when I am trying to find out a perfect fit in R

I am lookig for an opinion.我正在寻找意见。 I am new to R and for work I am trying to create a tarif pricing structure using the following: exposition, zone vehicle and drivers age (both categorical I was able to create some groups based on the age), fuel and brand of the car (also categorical).我是 R 的新手,为了工作,我正在尝试使用以下内容创建一个关税定价结构:博览会、区域车辆和司机年龄(我都可以根据年龄创建一些类别)、燃料和汽车品牌(也是分类的)。

Looking at the data I have noticed that I currently have some overdispersion so I went ahead and tried to fit a Negative Binomial.查看数据,我注意到我目前有一些过度离散,所以我继续尝试拟合负二项式。 I also managed to improve the model a bit using likelihood tests, chi squared using the anova function.我还设法使用似然检验改进了 model,使用方差分析 function 进行卡方检验。

However I did notice something odd.但是我确实注意到了一些奇怪的事情。 Looking at the brand coeficient (it goes from 2 to 14) some of the variables are significnat at a 5% level while others are not.查看品牌系数(从 2 到 14),一些变量在 5% 的水平上是显着的,而另一些则不是。 I did perform a Likelihood ratio test and it is telling me that the brand coeficient is significant.我确实进行了似然比测试,它告诉我品牌系数很重要。

How can I tell R that I only want to estimate the models with brands 5,10 and 12 since the others are not significant meaning they insurers with those brands should pay the same as a stnadard insurer?我怎么能告诉 R 我只想估计品牌 5,10 和 12 的模型,因为其他品牌并不重要,这意味着拥有这些品牌的保险公司应该支付与标准保险公司相同的费用?

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.92426 0.10172 -18.916 < 2e-16 ***

zone2C       0.16620 0.05799 2.866   0.00416 **

zone2D       0.42580 0.05946 7.161 8.04e-13 ***

zone2E       0.57356 0.06088 9.421 < 2e-16 ***

zone2F       0.58382 0.13233 4.412 1.03e-05 ***

vehcut2[4,16) 0.09004 0.05096 1.767 0.07724 .

vehcut2[16,101) -0.19546 0.09267 -2.109 0.03494 *

agecut1[26,31) -0.51136 0.10015 -5.106 3.29e-07 ***

agecut1[31,41) -0.59369 0.08502 -6.983 2.89e-12 ***

agecut1[41,51) -0.58597 0.08455 -6.930 4.21e-12 ***

agecut1[51,61) -0.67614 0.08734 -7.741 9.85e-15 ***

agecut1[61,71) -0.70625 0.09992 -7.068 1.57e-12 ***

agecut1[71,81) -0.76348 0.11806 -6.467 1.00e-10 ***

agecut1[81,101) -0.96703 0.23006 -4.203 2.63e-05 ***

as.factor(brand)2 0.02324 0.05663 0.410 0.68154

as.factor(brand)3 0.11332 0.07796 1.454 0.14606

as.factor(brand)4 -0.09019 0.11436 -0.789 0.43032

as.factor(brand)5 0.16641 0.08982 1.853 0.06392 .

as.factor(brand)6 -0.14618 0.11194 -1.306 0.19158

as.factor(brand)10 0.24718 0.11889 2.079 0.03761 *

as.factor(brand)11 0.22740 0.13854 1.641 0.10072

as.factor(brand)12 -0.15984 0.07034 -2.272 0.02306 *

as.factor(brand)13 0.21873 0.13721 1.594 0.11092

as.factor(brand)14 -0.25814 0.27270 -0.947 0.34384

fuelE              -0.16247 0.04202 -3.867 0.00011 ***

Thank you!谢谢!

You could recode your brand variable as follows:您可以按如下方式重新编码您的brand变量:

library(dplyr)
data <- data %>% 
  mutate(
    brand = case_when(
        brand == 5 ~ "5", 
        brand == 10 ~ "10", 
        brand == 12 ~ "12", 
        TRUE ~ "Other"), 
    brand = factor(brand, levels=c("Other", "5", "10", "12"))
)

and then re-run the model.然后重新运行 model。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM