简体   繁体   English

plm 回归对两个变量的固定效应,

[英]plm regression fixed effects on two variables,

I have the following simplified df:我有以下简化的df:

problem <- data.frame(
           stringsAsFactors = FALSE,
                fkeycompany = c("0000001961",
                                "0000003570","0000003570","0000003570",
                                "0000003570","0000003570","0000003570",
                                "0000003570","0000004187","0000004187","0000004187",
                                "0000004187","0000016058","0000022872",
                                "0000022872","0000022872","0000022872","0000024071",
                                "0000050471","0000052971","0000052971",
                                "0000056679","0000058592","0000058592","0000058592",
                                "0000063330","0000099047","0000099047",
                                "0000099047","0000316206","0000316537",
                                "0000319697","0000351917","0000351917","0000351917",
                                "0000356037","0000356037","0000356037",
                                "0000700815","0000700815","0000700815","0000700815",
                                "0000704415","0000704415","0000704415",
                                "0000705003","0000720154","0000720154","0000720154",
                                "0000720154"),
                 fiscalyear = c(2018,2002,
                                2002,2004,2006,2007,2007,2014,2005,2005,
                                2009,2017,2003,2002,2004,2004,2010,2002,
                                2016,2008,2008,2002,2005,2005,2010,2014,
                                2000,2005,2005,2002,2002,2001,2005,2005,
                                2006,2007,2012,2015,2006,2006,2007,2008,
                                2003,2014,2014,2000,2004,2006,2008,2013),
           zmijewskiscore = c(-0.295998372490631,-3.0604522838509,-3.0604522838509,
                                -9.70437199970406,-0.836774487816746,
                                0.500903351523752,0.500903351523752,-1.29210741224579,
                                -1.96529713996165,-1.96529713996165,
                                -1.60831783946871,-2.12343231229296,-3.99767761748961,
                                0.561261861396196,4.13793269655047,4.13793269655047,
                                5.61803398400963,-0.000195582736436772,
                                -3.93766039340527,-0.540037039625719,
                                -0.540037039625719,-1.93767533120689,-4.54446419505987,
                                -4.54446419505987,1.94389244672183,
                                0.941272649148121,-3.88427264672157,-0.342812414189714,
                                -0.342812414189714,-1.35074505582686,
                                -4.52746658422071,-0.130671284507204,-0.223517713694019,
                                -0.223517713694019,0.0149617517859735,
                                -2.95100357094774,-2.55146691134187,-1.86846592111008,
                                2.92283100206773,2.92283100206773,
                                4.65325023636937,6.1585365469118,-4.54449586848866,
                                -1.49969162335521,-1.49969162335521,-3.34071706450412,
                                -1.72382101559976,-1.53076052307727,
                                -1.77582320023177,-1.57280701642882),
           lloss = c(0,1,1,1,1,
                     1,1,1,0,0,0,1,0,0,1,1,1,1,0,1,1,
                     1,0,0,1,0,0,1,1,0,0,1,1,1,1,0,0,
                     1,1,1,1,1,0,1,1,0,1,1,1,0),
  GCO_prev = c(1,1,1,0,0,
               0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,
               0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
               0,0,0,0,1,0,0,0,0,0,0,0,0),
  GCO = c(0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,
          0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,
          0,0,0,1,1,0,0,0,0,0,0,0,0),
  industry = c(9,5,5,5,5,
               5,5,5,6,6,6,6,9,9,9,9,9,6,9,6,6,
               9,8,8,8,8,9,9,9,9,8,9,5,5,5,9,9,
               9,6,6,6,6,9,9,9,9,9,9,9,9))

I would like to run a plm regression on this with fixed effects on year and industry.我想对此进行 plm 回归,并对年份和行业产生固定影响。

library(plm)
summary(plm(GCO ~ GCO_prev + lloss + zmijewskiscore, index=c("fiscalyear", "industry"), data=problem, model="within" ))

However, I get this error while running:但是,我在运行时收到此错误:

Error in pdim.default(index[[1L]], index[[2L]]) : 
  duplicate couples (id-time)
In addition: Warning message:
In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")

I do not quite know how to fix this.我不太清楚如何解决这个问题。 If I am assuming correctly, it has something to do with there being more companies ( fkeycompany code) than 1 that have for example for industry = 9, fiscalyear = 2003 for example.如果我假设正确,这与公司( fkeycompany代码)多于 1 家有关,例如industry = 9, fiscalyear = 2003 年。 So for some industries, lets say 9, there are more rows (fkeycompanies, in this example 0000016058 & 0000704415) which contain the year 2003 (or at least, thats what I think is the issue, or am I wrong?).因此,对于某些行业,比如说 9,有更多行(fkeycompanies,在本例中为 0000016058 和 0000704415)包含 2003 年(或者至少,这就是我认为的问题,还是我错了?)。 This is with more industries and years the issue I believe in my main dataset.这是我在我的主要数据集中相信的更多行业和年份的问题。 How do I fix this error message?如何修复此错误消息?

Also, besides this issue, is the code correctly that I am running?另外,除了这个问题,我正在运行的代码是否正确? Am I indeed regressing with year and industry effects?我是否确实随着年份和行业影响而倒退?

Given your data, the observational unit for panel data is firms ( fkeycompany ).给定您的数据,面板数据的观察单位是公司( fkeycompany )。 You might want to add the industry as another fixed effect, but it cetainly is not the time index (the time index goes into the 2nd slot of argument index and I will assume it is fiscalyear ).您可能希望将行业添加为另一个固定效应,但它肯定不是时间索引(时间索引进入参数index的第二个位置,我假设它是fiscalyear )。 There are plenty of questions with answers to the topic.有很多问题可以回答该主题。 Also, do read the packages first vignette where the data specification for the index argument is explained.此外,请务必先阅读软件包,其中解释了index参数的数据规范。

I suggest to convert to pdata.frame first.我建议先转换为 pdata.frame。

However, there are double constellations of fkeycompany and fiscal year, see below code where the output of table with a value > 1 hints you to the combinations.但是,有 fkeycompany 和会计年度的双重星座,请参见下面的代码,其中值 > 1 的table的输出提示您组合。

library(plm)
pdat.problem <- pdata.frame(problem, index = c("fkeycompany", "fiscalyear"))
#> Warning in pdata.frame(problem, index = c("fkeycompany", "fiscalyear")): duplicate couples (id-time) in resulting pdata.frame
#>  to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
table(index(pdat.problem), useNA = "ifany")
#>             fiscalyear
#> fkeycompany  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2012 2013
#>   0000001961    0    0    0    0    0    0    0    0    0    0    0    0    0
#>   0000003570    0    0    2    0    1    0    1    2    0    0    0    0    0
#>   0000004187    0    0    0    0    0    2    0    0    0    1    0    0    0
#>   0000016058    0    0    0    1    0    0    0    0    0    0    0    0    0
#>   0000022872    0    0    1    0    2    0    0    0    0    0    1    0    0
#>   0000024071    0    0    1    0    0    0    0    0    0    0    0    0    0
#>   0000050471    0    0    0    0    0    0    0    0    0    0    0    0    0
#>   0000052971    0    0    0    0    0    0    0    0    2    0    0    0    0
#>   0000056679    0    0    1    0    0    0    0    0    0    0    0    0    0
#>   0000058592    0    0    0    0    0    2    0    0    0    0    1    0    0
#>   0000063330    0    0    0    0    0    0    0    0    0    0    0    0    0
#>   0000099047    1    0    0    0    0    2    0    0    0    0    0    0    0
#>   0000316206    0    0    1    0    0    0    0    0    0    0    0    0    0
#>   0000316537    0    0    1    0    0    0    0    0    0    0    0    0    0
#>   0000319697    0    1    0    0    0    0    0    0    0    0    0    0    0
#>   0000351917    0    0    0    0    0    2    1    0    0    0    0    0    0
#>   0000356037    0    0    0    0    0    0    0    1    0    0    0    1    0
#>   0000700815    0    0    0    0    0    0    2    1    1    0    0    0    0
#>   0000704415    0    0    0    1    0    0    0    0    0    0    0    0    0
#>   0000705003    1    0    0    0    0    0    0    0    0    0    0    0    0
#>   0000720154    0    0    0    0    1    0    1    0    1    0    0    0    1
#>             fiscalyear
#> fkeycompany  2014 2015 2016 2017 2018
#>   0000001961    0    0    0    0    1
#>   0000003570    1    0    0    0    0
#>   0000004187    0    0    0    1    0
#>   0000016058    0    0    0    0    0
#>   0000022872    0    0    0    0    0
#>   0000024071    0    0    0    0    0
#>   0000050471    0    0    1    0    0
#>   0000052971    0    0    0    0    0
#>   0000056679    0    0    0    0    0
#>   0000058592    0    0    0    0    0
#>   0000063330    1    0    0    0    0
#>   0000099047    0    0    0    0    0
#>   0000316206    0    0    0    0    0
#>   0000316537    0    0    0    0    0
#>   0000319697    0    0    0    0    0
#>   0000351917    0    0    0    0    0
#>   0000356037    0    1    0    0    0
#>   0000700815    0    0    0    0    0
#>   0000704415    2    0    0    0    0
#>   0000705003    0    0    0    0    0
#>   0000720154    0    0    0    0    0

Once fixed, you will be able to run a model along these lines.修复后,您将能够按照这些思路运行模型。 For a time-fixed-effects model:对于时间固定效应模型:

model <- plm(GCO ~ GCO_prev + lloss + zmijewskiscore, data = pdat.problem, model="within", effect = "time")

Or time-fixed-effects with industry as an additional fixed effect:或以industry作为附加固定效应的时间固定效应:

model2 <- plm(GCO ~ GCO_prev + lloss + zmijewskiscore + factor(industry), data = pdat.problem, model="within", effect = "time")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM