简体   繁体   English

将虚拟变量与R回归

[英]Incorporating a dummy variable into a regression with R

I'm wondering how I can do a fixed regression while having some data points correspond to the aggregate dummy category "nonindustrialized" and the others correspond to their individual country names. 我想知道如何在有一些数据点对应于“非工业化”汇总虚拟类别而其他数据对应于各自的国家名称的情况下进行固定回归。 I first ran a regression: 我首先进行回归:

reg1 <- lm(birthrate ~ country*year)

and would like to subset the "country" data into nonindustrialized but leave all the industrialized data points as disaggregated. 并希望将“国家”数据分为非工业化子集,但将所有工业化数据点保留为分解数据。 I made a logical TRUE/FALSE column for industrialized, but can't figure out how to subset it correctly without just getting it into two aggregated groups. 我为工业化做了逻辑上的TRUE / FALSE列,但无法弄清楚如何正确地将其子集化,而不仅仅是将其分为两个聚合组。 Is there a way to just do it for the FALSE points and to have all the other points as individual countries? 有没有一种方法可以针对FALSE积分做到这一点,并将所有其他积分作为单个国家/地区?

Thank you! 谢谢!

It's not entirely clear from your question, but I'm assuming your dataframe is in long form and looks something like this: 您的问题尚不完全清楚,但我假设您的数据框格式很长,看起来像这样:

country<-(rep(c("A","B","C"),4))
birthplace<-rep(c("x","y"),6)
year<-c(2001:2012)
df<-data.frame(country,birthplace,year)

> df
   country birthplace year
1        A          x 2001
2        B          y 2002
3        C          x 2003
4        A          y 2004
...

In that case, you can easily add a new column that either defines each line as nonindustrialized or else gives the original country value: 在这种情况下,您可以轻松地添加一个新列,该列将每行定义为非工业化或提供原始国家/地区值:

df$country.agg<-ifelse(df$country=="A"|df$country=="B","nonindustrialized",as.character(df$country))

Now you can use this column in your regression, which will pool all nonindustrialized countries into one category. 现在,您可以在回归中使用此列,它将所有非工业化国家集中到一个类别中。 Is this what you're looking for? 这是您要找的东西吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM