简体   繁体   English

R中虚拟变量的回归

[英]Regression of dummy variables in R

I am new to R and I am trying to performa regression on my dataset, which includes eg monthly sales data of a company in different countries over multiple years. 我是R的新手,我正在尝试对我的数据集执行回归,其中包括多年来不同国家的公司的月度销售数据。

In other statistical programs, in order to control for quarterly cyclical movement of sales as well as for the regional (country) differences, I would create dummy variables indicating eg quarters and countries where sales are made. 在其他统计程序中,为了控制销售的季度周期性变动以及区域(国家)差异,我将创建虚拟变量,表明销售的季度和国家。

My questions: 我的问题:

1) I saw that in R you can set a variable type to 'Factor'. 1)我在R中看到你可以将变量类型设置为'Factor'。 Do I in this case still need to create dummy variables indicating countries and months/quarters, or is R already treating the factor variables differently and is automatically converting them to dummies in the background? 在这种情况下,我是否仍然需要创建指示国家和月/季度的虚拟变量,或者R是否已经不同地处理因子变量并且在后台自动将它们转换为虚拟变量?

2) If the above is not the case, and I indeed need to recode my values into 0,1 dummies, is there a neat standard way in R to do it? 2)如果上面的情况并非如此,而且我确实需要将我的值重新编码为0个假人,那么在R中有一个简洁的标准方法吗?

Thanks a lot for your help and have a nice day! 非常感谢您的帮助,祝您度过愉快的一天!

Trgovec Trgovec

Yes, R automatically treats factor variables as reference dummies, so there's nothing else you need to do and, if you run your regression, you should see the typical output for dummy variables for those factors. 是的,R会自动将因子变量视为参考虚拟对象,因此您无需执行任何其他操作,如果运行回归,则应该看到这些因子的虚拟变量的典型输出。

Notice, however, that there are several ways of coding categorical variables, so you might want to do something different using the C function. 但请注意,有几种方法可以对分类变量进行编码,因此您可能希望使用C函数执行不同的操作。 You can find good details here . 你可以在这里找到好的细节。 Also, there are packages devoted to help you in the creation of dummy variables if you need more control, such as the dummies package. 此外,如果您需要更多控制,例如dummies包,还有一些软件包可以帮助您创建虚拟变量。

R will automatically create the corresponding design model.matrix() from your formula, eg: R将自动从您的公式创建相应的设计model.matrix() ,例如:

lm(mpg ~ factor(gear) + I(cyl > 4), data = mtcars)

If you like to create the dummies yourself then take a look at model.matrix() 如果你想自己创建假人,那么看看model.matrix()

model.matrix(~ - 1 + factor(gear), data = mtcars)

                    factor(gear)3 factor(gear)4 factor(gear)5
Mazda RX4                       0             1             0
Mazda RX4 Wag                   0             1             0
Datsun 710                      0             1             0
Hornet 4 Drive                  1             0             0
Hornet Sportabout               1             0             0
Valiant                         1             0             0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM