简体   繁体   English

如何在 R 中汇总 glm() function 的数据

[英]How do I aggregate data for glm() function in R

I am trying to estimate relativities for insurance pricing using a glm.我正在尝试使用 glm 估计保险定价的相对性。 I'm using the "freMPTL" in CASdatasets.我在 CASdatasets 中使用“freMPTL”。 ClaimNb is my response, Exposure is my Exposure, I'm interested in ClaimNb/Exposure. ClaimNb 是我的回应,Exposure 是我的 Exposure,我对 ClaimNb/Exposure 感兴趣。

After dividing the larger categories such as driver age (18-99) into smaller groups of ex.将较大的类别(例如驾驶员年龄(18-99))划分为较小的前组之后。 5 categories, I grouped the data using 5个类别,我使用分组数据

data_grouped_freq <- data_freq4 %>%
  group_by(Power, Brand, Gas, Region, CarAge_cat, DriverAge_cat, Density_cat) %>%
  summarise(ClaimNb  = sum(ClaimNb),
            Exposure = sum(Exposure))

after which I use the command之后我使用命令

model_freq <- glm(ClaimNb ~ Power + Brand + Gas + Region + CarAge_cat + DriverAge_cat + Density_cat,
 family = poisson, data = data_grouped_freq, weights = Exposure)
    summary(model_freq)

to plot a glm.到 plot 一个 glm。 The result is then那么结果就是

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-255.241    -2.634    -0.929    -0.202   199.629  

Coefficients:
                                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)                              4.8629082  0.0011698 4156.99   <2e-16 ***
Powerd                                  -0.4660131  0.0014613 -318.90   <2e-16 ***
Powere                                  -0.7155983  0.0013723 -521.44   <2e-16 ***
Powerg                                  -0.4131892  0.0010905 -378.89   <2e-16 ***
...
RegionPoitou-Charentes                  -2.3903228  0.0052288 -457.14   <2e-16 ***
CarAge_cat1                             -1.2547176  0.0021645 -579.68   <2e-16 ***
DriverAge_cat1                          -0.7913098  0.0022811 -346.90   <2e-16 ***
DriverAge_cat2                          -1.2886084  0.0024688 -521.96   <2e-16 ***

I know that this is wrong because DriverAge_cat1 has a higher ratio of ClaimNb/Exposure and should thus result in a relativity>1, which exp(-18.9082) is not.我知道这是错误的,因为 DriverAge_cat1 具有更高的 ClaimNb/Exposure 比率,因此应该导致相对性>1,而 exp(-18.9082) 不是。 (The ratio of ClaimNb/Exposure for cat1 is 0.134 compared to 0.071 in the reference group of DriverAge_cat1) (cat1 的 ClaimNb/Exposure 比率为 0.134,而 DriverAge_cat1 的参考组为 0.071)

Can someone explain what I am doing wrong?有人可以解释我做错了什么吗? Is it perhaps the fact that there are a lot of categories with 0 Claims causing problems?是不是有很多类别的 0 声明导致问题? Maybe i'm treating weights wrong?也许我处理错了重量? There are 14661 total cells across 7 variables. 7 个变量共有 14661 个单元格。

In your GLM code for creating Poisson Rate model you should use parameter offset -在用于创建泊松率 model 的 GLM 代码中,您应该使用参数偏移量-

model_freq <- glm(ClaimNb ~ Power + Brand + Gas + Region + CarAge_cat + DriverAge_cat + Density_cat,
 family = poisson, data = data_grouped_freq, offset= log(Exposure))

the above modified code should solve your issue.上面修改的代码应该可以解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM