I am trying to do a Difference-In-Differences Regression with Fixed Effects. The regression is meant to estimate the impact of participating in a televised Sports Event on the Social Media Follower Count of the participating Teams, compared to other Teams that did not participate.
My Data looks like this: [Data][1]
The dependent variable is the Rate_Percent, which is the growth rate of Facebook-Likes, which is calculated as follows
Dataset_FB <- Dataset_FB %>% group_by(ID) %>%
mutate(Diff_Growth = FBLikes - lag(FBLikes),
Rate_Percent = Diff_Growth / lag(FBLikes) * 100)
Teilnahme is a Dummy Variable to tell the Participants from the non-Participants, and Hauptrunde is a Dummy Variable to indicate the time frame of the treatment (0 before the treatment, 1 after the treatment). I am trying to include the ID, Uhrzeit and Spieltag as fixed effects to control for Club- and Time- differences.
My regression looks like this:
reg <- lm (Rate_Percent ~ Teilnahme + Hauptrunde + Teilnahme*Hauptrunde + factor(ID) + factor(Uhrzeit) + factor(Spieltag), data=Dataset_FB)
Now, my questions are as follows:
The summary looks like this:
lm(formula = Rate_Percent ~ Teilnahme + Hauptrunde + Teilnahme *
Hauptrunde + factor(ID) + factor(Uhrzeit) + factor(Spieltag),
data = Dataset_FB)
Residuals:
Min 1Q Median 3Q Max
-0.2834 -0.0343 -0.0111 0.0092 4.9302
Coefficients: (6 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0266970 0.0125098 2.134 0.03288 *
Teilnahme 0.0020571 0.1662742 0.012 0.99013
Hauptrunde -0.0158433 0.0060631 -2.613 0.00900 **
factor(ID)8 -0.0344717 0.0171467 -2.010 0.04443 *
factor(ID)25 -0.0155100 0.1662745 -0.093 0.92568
factor(ID)56 0.0122209 0.0171467 0.713 0.47604
factor(ID)69 -0.0093248 0.1662745 -0.056 0.95528
factor(ID)90 -0.0037743 0.0171467 -0.220 0.82578
factor(ID)93 0.0948638 0.0171467 5.532 3.29e-08 ***
factor(ID)103 0.0117689 0.0171467 0.686 0.49251
factor(ID)115 0.0479442 0.0171467 2.796 0.00519 **
factor(ID)166 -0.0129542 0.0171467 -0.755 0.44998
factor(ID)364 -0.0112018 0.0171467 -0.653 0.51359
factor(ID)373 -0.0111296 0.0171467 -0.649 0.51631
factor(ID)490 -0.0231408 0.0171467 -1.350 0.17720
factor(ID)752 -0.0064241 0.0171467 -0.375 0.70793
factor(ID)907 0.1333400 0.0171467 7.776 8.75e-15 ***
factor(ID)951 0.0087327 0.0171467 0.509 0.61057
factor(ID)996 -0.0105943 0.0171467 -0.618 0.53669
factor(ID)1238 0.0076285 0.0171467 0.445 0.65641
factor(ID)1315 0.0304732 0.1662745 0.183 0.85459
factor(ID)1316 0.1290605 0.0171467 7.527 5.98e-14 ***
factor(ID)1400 0.0038137 0.0171467 0.222 0.82400
factor(ID)1401 -0.0135700 0.0171467 -0.791 0.42874
factor(ID)1712 -0.0001285 0.0171467 -0.007 0.99402
factor(ID)3417 0.0053766 0.0171467 0.314 0.75386
factor(ID)5646 0.0052521 0.0171467 0.306 0.75939
factor(ID)6273 -0.0134096 0.0171467 -0.782 0.43422
factor(ID)7679 -0.0104365 0.0171467 -0.609 0.54277
factor(ID)9029 NA NA NA NA
factor(ID)10213 -0.0441121 0.0171467 -2.573 0.01012 *
factor(ID)26957 -0.0287541 0.0171700 -1.675 0.09405 .
factor(ID)29988 0.1015109 0.1662745 0.611 0.54155
factor(ID)40373 0.0203831 0.0171467 1.189 0.23459
factor(Uhrzeit)1530 0.0206731 0.1653880 0.125 0.90053
factor(Uhrzeit)1830 NA NA NA NA
factor(Uhrzeit)2045 NA NA NA NA
factor(Spieltag)NA NA NA NA NA
factor(Spieltag)Sa NA NA NA NA
factor(Spieltag)So NA NA NA NA
Teilnahme:Hauptrunde 0.0053874 0.0085752 0.628 0.52987
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1649 on 5885 degrees of freedom
(32 observations deleted due to missingness)
Multiple R-squared: 0.07278, Adjusted R-squared: 0.06742
F-statistic: 13.59 on 34 and 5885 DF, p-value: < 2.2e-16
[1]: https://i.stack.imgur.com/ZBAqL.png
The output is correct and you did nothing wrong per se , but there are more elegant ways to run the fixed effects regression.
Yes, although the fixed effects will not be consistently estimated in the model.
Here, the singularities means that you have observations in ID
, Uhrzeit
and Spieltag
where you have only one unique observation, so the model cannot estimate a coefficient for these.
I would suggest having a look into two packages:
plm
, which is the standard for panel data models. I am not 100% sure if your data is a real panel (and whether you are actually estimating a diff-in-diff specification).You would have something like:
data <- pdata.frame(data, index=c("ID", "Uhrzeit"))
plm(formula = Rate_Percent ~ Teilnahme + Hauptrunde + Teilnahme *
Hauptrunde + factor(Spieltag), data=Dataset_FB, model = "within", effect = "twoways", index = c("ID","Uhrzeit"))
felm
is a great and easy to use alternative, where you specify the factor variables after a |
in the formula.est <- felm(Rate_Percent ~ Teilnahme + Hauptrunde + Teilnahme *
Hauptrunde | ID + Uhrzeit + Spieltag, data = Dataset_FB)
As an explanation: The way these fixed-effects packages work is they first transform your data (with a within-transformation), basically taking out the averages of the groups. Thanks to this, we don't actually need to estimate the coefficients for the fixed effects, as done in your code. So these other solutions are slightly more neat and produce easier-to-read output, but numerically, there shouldn't be any difference.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.