繁体   English   中英

运行多个逐步线性回归模型以预测 r 中的 NA 值

[英]Running multiple stepwise linear regression models to predict NA values in r

我正在进行逐步线性回归来预测缺失值。 我可以在逐个变量的基础上执行此操作,但我有一个包含 50 多个变量的非常大的数据框,需要设法自动化获取多个变量的拟合值的过程。 我知道许多统计学家不喜欢逐步程序,但我仍然想实施它们。

下面是我用来逐个变量执行此操作的代码:

test <- data.frame(predict.lm(object = step(lm(dep_var1 ~ ind_var1 + ind_var2 + ind_var3, data = df1),direction = "both"), newdata = df1))

colnames(test) <- "dep_var1"

下面是示例数据

df1 <- structure(list(dep_var1 = c(NA_real_, NA_real_, 
-2.09123267205066, 0.230793085482842, 2.37381389867166, -1.254476456844, 
0.803358768774937, -0.193694287225052, 1.4135048896131, -1.01027931169849, 
-0.353471151423884, -1.8471429353131, 0.846656684067891, -0.577619029380873, 
1.56174835187537, -0.180654842356546, 0.606702067578114, 0.63196118363776, 
-2.07546608269867, -1.6981663767802, -2.37523932992292, 0.76639616724562, 
2.79632224479538, -2.83455947605957, -1.33255484820427, 1.13620307003978, 
0.0748723253449958, -0.971846570370541, 0.833084653739389, 1.22652791855451, 
-1.41360170749287, 1.56830155870067, -1.12470646556145, -0.0187794024628569, 
-0.423859330845611, -0.712475730126666, -0.188195097884893, -0.925214646951187, 
2.34270511007552, -1.93278147868247, 0.327538505404795, 0.631163864457143, 
-2.85767723932405, 1.75496256076676, -1.42847227988351, 2.7512047410972, 
-1.15934991023766, -1.54975291965205, -0.11032054745398, 1.92751343170804, 
0.789613141824792, -0.917519738054573, -0.952544104866665, -3.24167052431999, 
-0.52210553650643, 0.18239691875455, -3.21027452658145, -0.827625012712401, 
-0.26672819041463, -1.94823563624677, 2.63505186730208, 0.0366011774775348, 
2.65569794154129, -2.12446625497985, -1.27360207957464, 0.448158096131414, 
-2.49661319932106, 1.02489387271096, -1.08099011979409, -0.364521583133239, 
1.84812022254912, -1.97231278697627, -0.548672808444616, -2.66885146325586, 
-2.23320660644535, -1.34044182986747, -0.988382288011769, -0.945936400194469, 
-0.374814294872094, 0.962918718857577, -2.26590978712601, -0.932063294009854, 
1.13878640351243, -0.472148199947895, 0.372002078593101, 1.00490709225994, 
-2.48452188170382, -0.250170527558021, 0.922254020376051, 3.13691655377035, 
0.0872528229244095, 1.48719103494955, -0.994742032242124, -1.73988494786043, 
0.424588121740004, -2.41510577689421, -1.5841259205017, 2.34360206782046, 
0.535053007004022, -0.795024729905373), dep_var2 = c(2.07303849961519, 
-1.02627125901242, 2.00209093064551, 2.33854031704522, -1.94170342751993, 
1.29711275552946, -1.1573914248646, 2.77266492930927, 1.52318282862803, 
2.50533399732185, 2.18247552424418, 1.57070140547483, -1.80780160813424, 
0.36791214355129, -2.49767760388436, 0.385602175407397, 0.11990775524449, 
-0.277242508402587, -1.45086031801734, 3.77402161660446, -1.24358503248032, 
-3.16519765000204, -0.58250906528939, 1.04464047101027, 0.173724227542418, 
-3.27068834263146, -1.12633556290261, 1.26357853218466, 0.314211534228324, 
-0.585398043962647, -0.897440667747893, -0.483528806014744, -0.583023502992864, 
-1.96040591216907, 0.996014489963131, 1.71087323572918, 0.623006241001743, 
2.11174786637826, 0.420870966700236, -0.318425846406272, -0.902348953954844, 
-1.56791408364248, 2.24200780236017, 1.04557599992065, 1.37600483352856, 
-2.86817745599522, -1.0387333666576, 1.07953682410029, 0.191775638252006, 
-1.48865614959846, -1.76195773849034, -0.298594272403301, 0.235042377873754, 
0.0403724174579101, -1.2327030772748, -0.509896189671339, 1.79187808213233, 
0.508896870272482, 1.87215238243187, 5.42089769981591, 1.05336781075391, 
1.96701365084408, -2.26904993911809, -1.32806705070234, 0.284169651292081, 
3.02750536394422, 1.55475894954328, -1.39469699223261, -0.647098215723534, 
-1.86470919954381, 0.132124712418362, 0.794947727046341, 0.765112914503222, 
1.0562579736073, 0.379018770290438, -0.911880644497877, 1.3675121350016, 
-0.899376872411081, -2.36095033247759, 1.59497346648275, -0.541751418443624, 
-1.34500493840032, 2.12015805342449, 2.77354184178997, -3.96370880146096, 
-0.0967628116821005, 1.97876659343358, -1.77845530622916, 1.16590928446694, 
-0.106112277520016, 1.19636132483196, 1.60566951317693, 2.09590452462496, 
0.214460090479266, -1.87019786463146, 1.64600594683429, 0.213332757178706, 
-2.17935397786443, 2.21635976782075, -0.392555892448031), dep_var3 = c(0.616700731082951, 
4.16279558260156, 1.10940530392079, -2.8569223582772, 0.402520816282224, 
-1.04411931764913, -0.609172559785609, -3.20807626475815, -2.08381934294098, 
-1.57712938280433, -1.44209052953985, -0.352794093438308, -0.608327907097134, 
-2.25597485701099, 2.19386899842515, 0.396416957807837, 1.33246847256144, 
-0.0762686733985066, 0.464588471846464, 3.94769110440112, 1.68318663058877, 
1.10935304551582, -2.71677518211804, 1.59362361780755, -1.62129130253971, 
-0.127118607974366, -0.417026737550066, -0.241262097212425, -1.52296844320382, 
-2.56829334841815, 0.799132956325209, 0.220522383259441, 2.37490948964111, 
4.15215150868392, -0.812992593809876, -0.173256232772018, 1.71074725747611, 
-1.0216605970604, -2.02721169453559, -4.09137683106018, 0.0474862298692908, 
3.31122428784435, -0.109026136376674, -3.46365644884461, -1.35460817015094, 
-0.899169317402685, 2.79440901022252, -0.794037627815716, 2.59917986374591, 
-2.14467166749864, 1.70019936889493, 0.721183948988304, -0.102388950793829, 
0.417677247084431, -1.01294623403926, 0.530290499693695, -0.678407609540795, 
1.36678775280302, 0.0970122249348387, 0.984762058542595, -3.21893736068827, 
-0.176771833178864, 1.46524980459238, 5.09545403085887, 1.46390691826153, 
-2.28175042941279, 1.17844832995436, -0.51656608642314, 0.915840406252925, 
1.8162815506279, -0.838763232984826, -1.78425071852195, -2.02035769534564, 
1.94260379368071, 4.03367533975736, -0.89328282008572, -2.73980411204667, 
-0.664566579870786, 1.2743809088601, 1.217725543838, 0.33860561843341, 
-1.7583845390752, -3.82437030519712, -4.1251791941278, 2.16768888784062, 
0.0208230680948219, -1.47964005154307, 0.0435783517650753, -3.94727089909519, 
-0.818173043130464, -3.4742303828308, -0.941225010967932, -0.979536393425847, 
-0.818834044969523, 0.795467907282362, -0.929285918331344, 0.668127671169617, 
-0.254668928895892, -2.13424401943605, -2.29388988629311), ind_var1 = c(0.458454397686833, 
-0.128440463741865, 0.363604764506242, -0.0693474758868018, 1.72259605847845, 
1.69526675465286, -1.623924222505, 0.15126566544286, -1.93552451013567, 
-2.58683178733901, -0.233912306362039, -2.47192439188638, 0.620795754754641, 
-0.992480709929954, 0.482192425484265, -2.61563698833568, 0.0128550866026035, 
0.392025740980614, -0.0473362942736612, -2.64909215232388, -1.47622293773269, 
3.16190990221028, 3.49243154151446, -0.272928040177153, -0.761411336416013, 
2.64997041637778, 0.577458182483536, -2.42929594600083, -0.267243349065099, 
0.722347497120074, 1.74884020954902, -0.0348288966586645, -1.52719161170932, 
-0.933148290337328, -0.490447995741133, 0.655322312303463, -2.52750457266348, 
0.668092340207411, 0.585782768355766, -0.359703526704027, 1.65001495114651, 
0.660363284824336, 0.0862383898649589, -0.365574191100425, -2.16177422896681, 
3.89053917972807, -0.142261253218103, 0.707021521565601, 0.0227116811915725, 
-0.454014719282556, 3.08453484473708, -1.06212270847072, -0.399418638058533, 
-0.262910611084249, 1.93593096630764, -0.725649177240837, -1.17309612984748, 
-0.373437242782234, -0.680948834115372, -4.13059660441355, -0.0409060052137248, 
0.989037314169956, 1.2259749106443, -0.66115377935577, -1.51318623204637, 
0.708828930872304, 2.34078004259392, 2.55044212723072, 0.141264088851028, 
2.17300161541665, 0.788684015013957, -2.80016454552875, 0.907606363872277, 
-2.53767303689764, 0.430023970340317, 0.972560430691479, -0.57115769920932, 
0.675371714699047, -0.819273676763145, -0.779254118891752, 1.13734662396304, 
-0.189212077733243, 1.62723080758521, -0.979259176936454, 1.14316624823637, 
2.91560630534064, 0.544678587889513, 0.104127307592218, 0.548266027482326, 
2.09782272529516, -0.405642732646619, -0.767523596762102, -0.101666159527356, 
0.478216111399646, 1.99281202677566, -2.226625310068, -0.971517903790143, 
0.460258073138533, -2.89835631489168, -1.02171119729811), ind_var2 = c(-0.056357182811544, 
1.74174805302751, 0.726184590489127, -0.776468741542423, -0.382713389335797, 
-2.04718702133114, 0.831366181579827, -0.213090131848065, 0.840865733882644, 
1.22835392560235, 0.157950531820239, 2.06119246289913, -0.956157941014712, 
-1.08971104497602, 0.326241704298168, 1.92200778034698, 0.688832722217709, 
-0.627922012586111, -1.19199346650355, 4.22350716099696, 0.641422750933785, 
-2.51080407306521, -2.48755232089754, 0.786465747299846, -1.75767028255026, 
-3.1809952588847, -1.16180005417099, 1.62222731815135, -0.36774662856744, 
-1.08013180924562, -0.792625832269249, 0.0354459155484843, 0.739265747174507, 
1.46933161619649, 0.665910133217599, 0.187823805723774, 2.56835385685832, 
-0.690151675677563, 0.698293566284355, -2.16814193217446, -1.49261328970516, 
0.676123306999542, -0.3939491038487, 0.448077244911608, 0.875734079074383, 
-2.86089580463621, 0.604268757076813, -1.64354489300732, 2.45923451123531, 
-1.68604842945783, -1.9184819589674, 0.139599937397156, 0.828244213896308, 
-1.75129154686091, -2.63929211963569, -0.543288071994073, -0.438679067953734, 
0.192090404456049, 0.758062917239584, 5.25351678020715, -0.277581138478905, 
0.119360139881858, 0.428014862847672, 2.2085245244809, 1.6315453284043, 
0.406134966449986, -1.95269069535625, -1.44363400477165, -0.773787305174728, 
-1.87725581196967, -0.173579458092002, 0.828185227827978, -0.753314550989367, 
2.55617987716488, 1.6298004240679, -2.21082666011452, -1.2473960162524, 
-2.36940584906052, 0.531174618968768, 2.62463381810192, -0.273642107149701, 
-0.932988862867355, -1.07788635500683, -0.674291949186377, -0.86325278256275, 
-2.40754111826735, -1.27808264400922, 0.177596193414942, -1.76242219594059, 
-1.03192825321543, -0.870426991870862, 0.907721012331873, -0.439384772692009, 
-1.73676155170012, -1.14685643668553, 0.355921250966228, 0.369132512048539, 
-1.03839194256396, 1.67059937513388, -1.32434182747233), ind_var3 = c(-1.1389104968572, 
-1.65852944320507, -1.45705577426981, 1.07794506870353, 0.719224058000476, 
-0.158461497822828, 0.705353993877171, 0.337767898018486, 0.117250430739658, 
-0.943398774117966, 0.0329809151250609, -0.568980218136715, 0.928266346136966, 
1.05631907220357, -0.0736055811494815, 0.196830300827318, -0.13576295582571, 
0.257537068142104, -0.137358419008261, -3.0554298580581, -0.533447743252316, 
1.12258694757551, 1.01687632724484, -1.79571198682012, 0.0148816879851791, 
0.82485066910626, 1.00423601009619, -1.07647074570615, 0.470091204928795, 
2.03233021484527, 0.0386841839290024, 0.593792838064128, -1.04728378442583, 
0.00874708446552375, -0.980903401411594, -1.00464434293468, -0.422762600910394, 
-0.42186665574121, 0.785678338823868, 0.452762774537635, 0.146780016995895, 
0.188940756286868, -0.510331441771421, 0.857829724013878, -1.14239581375406, 
1.70863954753159, -0.45918654843729, 0.0576603952242708, -1.27129923558338, 
2.02258278000593, 0.40380866400308, -0.654966856348495, 0.174065512343151, 
0.0275895676352105, 0.918865223950716, -0.584475829976857, -1.19524511596668, 
-0.487679955982114, -0.369099439891801, -2.99052050986791, 1.48199456815231, 
-0.982177118355558, 1.1861353538926, -1.08400989832084, -0.611798044606918, 
0.195029407984118, -0.933873607869469, 0.932982555282905, 0.749446947724109, 
0.309289116358974, 0.490082369957284, -0.479016122713183, 0.224163061951812, 
-1.55318448145768, -1.60841407694929, 0.0313841417028764, 0.529735266681235, 
0.487000304158991, 0.182326460494007, -1.00576805100532, -0.718578942204117, 
0.384314741454849, 0.633681783832062, 0.683973793799741, 0.200446142331914, 
0.376184166146214, -0.459051327415705, 0.352483771659012, 1.13367389882802, 
1.61456716867767, 0.113332066436203, 0.828244743171307, -0.302128248121384, 
-0.0394767029347994, 0.624579306812765, -0.613476676670482, -0.735579500581425, 
0.833063484439717, -0.353751888509078, 0.351207888901893)), class = "data.frame", row.names = c(NA, 
-100L))

你想做这样的事情吗?

dep_cols <- grep('dep', names(df1), value = TRUE)
ind_cols <- grep('ind', names(df1), value = TRUE)

models <- lapply(dep_cols, function(x) step(lm(reformulate(ind_cols, x), 
                                    data = df1), direction = "both"))
new_data <- lapply(models, function(x) data.frame(value = 
                            predict.lm(object = x, newdata = df1)))

您也可以将两个lapply调用合并为一个,但为了清楚起见,我将它们分开。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM