简体   繁体   中英

how can I calculate the mean of variable importance of elements in the list?

I'm training the random forest algorithm three times and saving the variables' importance into the list ( using the caret package). how can I calculate the mean of each feature if it exists? for example, how can I calculate the mean of three overall "ESR"? ( I am going to train this algorithm a thousand times ) these are my example:

[[1]]
rf variable importance


  only 20 most important variables shown (out of 119)

                 Overall
Albumin           100.00
age                97.36
PR                 60.18
RR                 42.41
Weight             35.26
SystolicBP         32.14
Cancers1           29.79
ESR                27.66
Neutrophyl         26.98
CPK                25.68
EjectionFraction   25.59
BMI                24.42
Calcium            23.87
WBC                22.36
Urea               22.01
LDH                21.23
FBS                20.21
Ddimer             19.32
HB                 18.99
Lymphocyte         18.78

[[2]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
FBS                57.80
WBC                53.88
PR                 53.84
Neutrophyl         53.52
Weight             52.31
HB                 51.69
LDH                50.15
Urea               49.31
Albumin            47.05
Lymphocyte         46.87
CPK                46.54
SystolicBP         45.64
Calcium            44.87
ESR                43.54
Ferritin           43.03
CRP                43.00
PLT                42.83
Creatinine         42.53
EjectionFraction   41.43
[[3]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
Albumin            43.41
Weight             24.88
FBS                24.63
BS                 23.31
PR                 21.47
LDH                21.06
Neutrophyl         20.68
BMI                17.94
EjectionFraction   17.29
CPK                16.49
WBC                16.11
ALP                15.72
RR                 15.28
Lymphocyte         14.94
Cancers1           14.68
CRP                14.50
ESR                14.38
Ddimer             13.05
Ferritin           12.96

can I create a data frame that saves the features and their overall? thanks for helping this is my code:

prediction_value_rf=list()
importance_rf=list()
auc_rf=list()
weight_rf=list()
for ( i in 1:1000){
   resample_death <- death[sample(nrow(death), size=300), ]
   resample_alive <-alive[sample(nrow(alive), size=300), ]
   f_dataset=rbind(resample_alive,resample_death)
   inx <- sample.split(seq_len(nrow(f_dataset)), 0.25)
   trainData<- f_dataset[!inx, ]
   testData <-  f_dataset[inx, ]
   rf_fit <- train(vital_status ~ ., 
                   data = trainData, 
                   method = "rf",
   )
   pred=predict(rf_fit, testData[,-109])
   pred1=predict(rf_fit, testData[,-109],type='prob')
   prediction_value_rf[[i]]=pred1[2]
   auc=auc(testData$vital_status,as.numeric(pred1[[2]]),direction="<", levels = levels(testData$vital_status))
   auc_rf[[i]]=auc
   a=varImp(rf_fit,scale = TRUE)
   importance_rf[[i]] <- a
   weight_rf[[i]]=max(rf_fit$results$Accuracy)
}

in the end, I want to calculate the mean of all overall features (wanna create ensemble model ). my dataset contain 109 feature and 4200 sample.

> dput(importance_rf)
list(structure(list(importance = structure(list(Overall = c(100, 
32.9191368970689, 0, 29.4889011862606, 24.8664587940577, 21.8746288172869, 
21.7051171149606, 20.0868919191658, 20.3678665772965, 20.2873319598582, 
33.7597621482843, 42.1891066454062, 22.7027798691687, 17.0766042463516, 
39.4559095867264, 17.9431725056776, 23.2881573588367, 5.04721532342669, 
22.3290849893345, 20.7266835722104, 21.5723519894789, 19.5211504808207, 
21.2794742178794, 20.1624361665348, 13.7420140365184, 31.7941409073075, 
20.9409991203303, 30.4229311296897, 11.5187371425859, 12.8487688047673, 
9.40749461290917, 10.361793419014, 32.5677389075859, 26.5411449178312, 
23.3996095888034, 2.84823906954271, 10.0257295515002, 2.27406632480383, 
0.221285401034356, 0.844517489791465, 1.97286969198767, 0.0909347758420391, 
0.541007254389242, 0.359718315763083, 1.26912866459011, 0.158954429130366, 
0.245159217854806, 1.43768928047267, 0.796627703857018, 0.0731764363395144, 
1.72357935713514, 0.424562470997031, 3.38312715168264, 1.88770244332681, 
0.0314985706869475, 0, 0.65427952713802, 0, 0.0171557103229226, 
0.709743254593806, 1.13539938842206, 0.0367104133426984, 2.95211595985093, 
0, 0.582868854914444, 0.393813676879418, 1.15732422255054, 2.24940561099934, 
1.73472209382337, 1.34428847541862, 1.15486784386305, 0, 0.689216959226089, 
0.625678629482648, 1.81161997423301, 0.433030827900777, 10.9106578268112, 
2.24295278032112, 18.176936900799, 1.74711580562318, 1.45310012173878, 
0.952143653091356, 1.16652405720194, 1.11866015943186, 2.68527336222893, 
1.12853921993574, 5.10727247259446, 1.93994049536545, 1.36475795626174, 
2.95717137358439, 0.115367165512589, 0, 1.45815337045876, 0, 
1.78943634306828, 5.71749991297189, 2.43536004133198, 1.27231795918686, 
11.4771984230702, 3.0971032186365, 0.708058471655881, 0.170261025718881, 
3.37435307537382, 1.56044494248123, 1.09294450754124, 0, 2.25592933845801, 
2.30276525800757, 1.86149986210819, 1.46145976307003, 1.26858067553346, 
2.11041986636824, 0.0902116364175813, 1.54299863875175, 0, 0.269632340125967, 
1.88548693593634, 4.47233507072462, 0.66752451890319)), class = "data.frame", row.names = c("age", 
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", "Ferritin", 
"HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", 
"PotassiumK", "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
"TotalLungInvolvementRank", "TotalLungInvolvementPercent", "sex2", 
"Type.of.heart.disease1", "Type.of.heart.disease2", "Type.of.heart.disease9", 
"Unilateral.paralysis1", "Ulcers1", "Obesity.BMI.above.351", 
"Peripheral.artery.disease1", "organ.involment.from.diabetes1", 
"organ.involment.from.diabetes2", "organ.involment.from.diabetes3", 
"UsingDrugHistory1", "UsingAlcoholHistory1", "Transplantation1", 
"SeverityofKidneyDisease1", "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
"KidneyTransplantation1", "Immunedeficiencydisease1", "Hypothyroidism1", 
"Hypertention1", "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", "Dyspnea1", 
"DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", 
"Body_Pain1", "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"), 
    structure(list(importance = structure(list(Overall = c(100, 
    36.8463357663146, 0, 20.5921448468941, 35.0980630859042, 
    15.7098956910968, 27.5542325637653, 22.3935810225052, 25.6062709809081, 
    18.9072078537409, 30.5428709528983, 26.4061314161858, 27.2933977255992, 
    18.3744993875278, 57.5115149169245, 14.4361277134982, 49.9265957132235, 
    6.10831602661626, 28.2527379885906, 23.0147565449908, 32.7997892888894, 
    22.7055707536584, 36.9763807158356, 28.9941599048441, 17.8186386653819, 
    31.2682240107287, 26.2894098494535, 41.1751827476675, 22.6316241605114, 
    16.9314172346857, 14.4927913128733, 13.1792980470757, 44.2836496383372, 
    32.7246002717468, 30.3912750391576, 10.0409713536124, 9.83444013035946, 
    2.50470824612248, 1.72055335723373, 1.05083165735798, 1.56193393834476, 
    0.233521622728958, 1.08064736921506, 0.555709266569136, 2.40106539585553, 
    0.291833555475466, 0.380999891346632, 2.56592221397732, 1.62107348934456, 
    0.504647559430998, 1.19859835755469, 0, 1.4382135880929, 
    1.94514657535966, 0, 0.0569205442253742, 0.44589056596685, 
    0.0539230755197555, 0, 0.055077983652405, 1.24527213390211, 
    0, 1.36267778294481, 0.151259347248717, 0.499919817645286, 
    0, 2.79981213016671, 2.72663427247346, 1.93725253183476, 
    2.70715099933653, 1.99722906280419, 0, 0.111342938271961, 
    1.2426657762317, 2.15186257620788, 0.584084013981451, 9.87542370836023, 
    3.21493418783175, 14.6556614893423, 0.67462103889104, 0.787088521176588, 
    2.61946726039402, 2.8099384934716, 0.377053883833586, 2.2824838493133, 
    1.12217532020233, 3.44210364347885, 2.61343827037804, 9.58864870521531, 
    1.77823199575717, 0, 0, 0.828679129518211, 0, 2.73842874693014, 
    14.5506870851474, 0.390367251047195, 0.811902694072225, 15.5803912323052, 
    4.18258978600944, 2.13546475796113, 2.66088800284236, 2.97761832225233, 
    3.54039994200135, 2.44519084017892, 0.737528372419208, 2.20708600548186, 
    4.12502178170407, 3.1835668678093, 7.61195991815971, 2.35303302862437, 
    5.70342032074721, 0.409606955773683, 2.4977310780031, 0.0107020031498121, 
    0.268000372472171, 2.32396173268619, 1.64515893404575, 0.868523484401606
    )), class = "data.frame", row.names = c("age", "Weight", 
    "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
    "ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
    "Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", 
    "Ferritin", "HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", 
    "PLT", "PR", "PhosphorP", "PotassiumK", "SodiumNA", "Totalbilirubin", 
    "Urea", "WBC", "EjectionFraction", "TotalLungInvolvementRank", 
    "TotalLungInvolvementPercent", "sex2", "Type.of.heart.disease1", 
    "Type.of.heart.disease2", "Type.of.heart.disease9", "Unilateral.paralysis1", 
    "Ulcers1", "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
    "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
    "organ.involment.from.diabetes3", "UsingDrugHistory1", "UsingAlcoholHistory1", 
    "Transplantation1", "SeverityofKidneyDisease1", "SeverityofKidneyDisease2", 
    "SeverityofKidneyDisease3", "SeverityChronicliverdisease1", 
    "SeverityChronicliverdisease2", "SeverityChronicliverdisease3", 
    "SeverityChronicliverdisease4", "SeverityChronicliverdisease9", 
    "Schizophrenia1", "Rheumatologicaldiseases1", "Pregnant1", 
    "Neurologicaldiseases1", "LiverTransplantation1", "KidneyTransplantation1", 
    "Immunedeficiencydisease1", "Hypothyroidism1", "Hypertention1", 
    "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
    "HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
    "Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
    "Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
    "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
    "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
    "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
    "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
    "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
    "Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", 
    "Dyspnea1", "DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", 
    "CardiacArrhythmia1", "Body_Pain1", "Bleeding1", "Ataxia1", 
    "Anorexia1", "PCRCOVID19Test1", "PCRCOVID19Test2")), model = "rf", 
        calledFrom = "varImp"), class = "varImp.train"), structure(list(
        importance = structure(list(Overall = c(100, 36.4519408382731, 
        0.0121282468302786, 27.9982404793903, 19.4487163883379, 
        24.6079653972917, 14.1539998143239, 18.684018340339, 
        20.1182663550791, 17.4200861293186, 46.6309831468223, 
        52.2217679510578, 28.5910698857479, 16.845796014194, 
        31.6509235655573, 17.1000574614637, 27.8424176478161, 
        5.69845064904499, 21.3838903337718, 20.217605303817, 
        19.8702958841878, 22.3737582989512, 33.0788664305301, 
        20.6035947546629, 16.3220426343042, 23.4809287675538, 
        23.1749036748423, 57.122094059206, 12.2409421568247, 
        11.234114301956, 15.7946508155502, 8.80563729211453, 
        20.2205078755919, 20.3091908316546, 27.7497357152039, 
        3.8622908315769, 12.8894291926347, 5.96701805516155, 
        0.761922263853243, 1.41991036581607, 1.54560737492769, 
        0.825161722105208, 0.0172016746252156, 0.693982409239905, 
        0, 0.358366468201754, 1.74812586771487, 2.2746344067366, 
        0.745595100629448, 0.465199425668223, 0.408092232849501, 
        0.115358703965213, 0.0358338604150282, 2.88640197248697, 
        0, 0.288302498762889, 0.332551323637155, 0.0121282468302786, 
        0, 1.03515126482736, 1.1213600137207, 0.329413397366096, 
        2.0612368962315, 0, 0.610994615626186, 1.0215655608971, 
        3.90651448858199, 1.73374217783332, 1.47244358073369, 
        2.20534241559288, 0.173681720638885, 0, 0.631950099628902, 
        0.132328128708788, 2.92435478031454, 1.03537122788376, 
        4.74067414123091, 1.77981701502525, 13.1150432121738, 
        0.720556880972878, 1.20366662244445, 1.19169376389038, 
        1.86442992849398, 0.518200723424615, 2.278501378269, 
        1.23638371282217, 3.66947066761794, 2.03933409738165, 
        1.25289331603719, 1.01627904400807, 0.0324453169731015, 
        0, 2.29817177168672, 0, 1.53194610140319, 7.15322639329996, 
        0.759542631415349, 1.53353473284619, 4.77390474517756, 
        1.05656481042379, 0.699450154375729, 1.16224285818854, 
        3.65223350861514, 1.93274707207956, 1.57589588221639, 
        0.449432695377871, 1.36863730886437, 2.11275137384133, 
        3.29450357362525, 1.08676677214028, 2.18565092410049, 
        1.15456248328987, 0.492245547306216, 1.59592156033113, 
        0.0129367966189638, 0.514499765305734, 1.58591810753971, 
        1.84832826238423, 0.807564130566264)), class = "data.frame", row.names = c("age", 
        "Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", 
        "DiastolicBP", "ALP", "ALT", "AST", "Albumin", "BS", 
        "CPK", "CRP", "Calcium", "Creatinine", "Ddimer", "Directbilirubin", 
        "ESR", "FBS", "Ferritin", "HB", "LDH", "Lymphocyte", 
        "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", "PotassiumK", 
        "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
        "TotalLungInvolvementRank", "TotalLungInvolvementPercent", 
        "sex2", "Type.of.heart.disease1", "Type.of.heart.disease2", 
        "Type.of.heart.disease9", "Unilateral.paralysis1", "Ulcers1", 
        "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
        "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
        "organ.involment.from.diabetes3", "UsingDrugHistory1", 
        "UsingAlcoholHistory1", "Transplantation1", "SeverityofKidneyDisease1", 
        "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
        "SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
        "SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
        "SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
        "Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
        "KidneyTransplantation1", "Immunedeficiencydisease1", 
        "Hypothyroidism1", "Hypertention1", "Hyperlipidemia1", 
        "Historyofsmoking1", "HistoryofHookah1", "HeartTransplantation1", 
        "HIV1", "FattyLiver1", "Diabetes1", "Chronicliverdisease1", 
        "Chronickidneydisease1", "CardiovascularDisease1", "Cancers1", 
        "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
        "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
        "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
        "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
        "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
        "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", 
        "Headace1", "Fever1", "Fatigue1", "EyeConjunctivitis1", 
        "Epigastric1", "Dyspnea1", "DryCough1", "Dizziness1", 
        "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", "Body_Pain1", 
        "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
        "PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"))

For this part:

how can I calculate the mean of each feature if it exists? for example, how can I calculate the mean of three overall "ESR"?

Because you have already generated the list, you can create a function that selects the row that contains the feature name, and then apply this function to each element of the list, and then flatten it, and then calculate the mean. In case in some element the feature doesn't exist, it can be excluded from mean calculation by using na.rm .

For example, this resembles your list:

mylist <- list(structure(list(Overall = c(100, 97.36, 60.18, 42.41, 35.26, 
32.14, 29.79, 27.66, 26.98, 25.68, 25.59, 24.42, 23.87, 22.36, 
22.01, 21.23, 20.21, 19.32, 18.99, 18.78)), class = "data.frame", row.names = c("Albumin", 
"age", "PR", "RR", "Weight", "SystolicBP", "Cancers1", "ESR", 
"Neutrophyl", "CPK", "EjectionFraction", "BMI", "Calcium", "WBC", 
"Urea", "LDH", "FBS", "Ddimer", "HB", "Lymphocyte")), structure(list(
    Overall = c(100, 57.8, 53.88, 53.84, 53.52, 52.31, 51.69, 
    50.15, 49.31, 47.05, 46.87, 46.54, 45.64, 44.87, 43.54, 43.03, 
    43, 42.83, 42.53, 41.43)), class = "data.frame", row.names = c("age", 
"FBS", "WBC", "PR", "Neutrophyl", "Weight", "HB", "LDH", "Urea", 
"Albumin", "Lymphocyte", "CPK", "SystolicBP", "Calcium", "ESR", 
"Ferritin", "CRP", "PLT", "Creatinine", "EjectionFraction")), 
    structure(list(Overall = c(100, 43.41, 24.88, 24.63, 23.31, 
    21.47, 21.06, 20.68, 17.94, 17.29, 16.49, 16.11, 15.72, 15.28, 
    14.94, 14.68, 14.5, 14.38, 13.05, 12.96)), class = "data.frame", row.names = c("age", 
    "Albumin", "Weight", "FBS", "BS", "PR", "LDH", "Neutrophyl", 
    "BMI", "EjectionFraction", "CPK", "WBC", "ALP", "RR", "Lymphocyte", 
    "Cancers1", "CRP", "ESR", "Ddimer", "Ferritin")))

Here is how to calculate the mean of ESR , which exists in all elements and CRP which does not exist in one of the elements:

mylist |> lapply(function(dat) dat["ESR", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.52667

mylist |> lapply(function(dat) dat["CRP", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.75

Because you have many features, you can create another function to apply this step to each feature. For example:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")
feature_mean <- function(feature_name){
    out <- lapply(mylist, function(dat) dat[feature_name, "Overall"])|> 
        unlist() |> mean(na.rm = TRUE) |> 
        setNames(paste0("mean_",feature_name))
    return(out)
     }

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#28.52667 28.75000 29.57000 30.78333 30.81333 

EDIT

The synthetic data used in the previous example, mylist , contains only one "Overall" data frame object in each of its elements, so that the extraction of the feature can be applied directly to the data using lapply . However, the actual data that you provided in the updated question, importance_rf has more than one objects in each of its element, with the "Overall" data frame object is in the first element. The difference is the cause of the error you showed in the comment. To apply the extraction, the "Overall" data frames should be extracted first, using lapply(function(list) list[[1]]) and then the previous steps can be applied.

# Extract mean ESR 
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["ESR", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 23.98857

# Extract mean CRP
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["CRP", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 17.4323

A {base R} way

The previous steps can be applied to a vector of features as follows:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")

feature_mean <- function(feature_name){
     out <- importance_rf |> 
         lapply(function(list) list[[1]]) |>
         lapply(function(dat) dat[feature_name, "Overall"])|> 
         unlist() |> mean(na.rm = TRUE) |> 
         setNames(paste0("mean_",feature_name))
     return(out)
}

# Extract the mean values

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491 

A brief explanation about the code:

  • lapply(function(list) list[[1]]) extract the first element of each element in important_rf list, which is the data frame that contains the features data.
  • dat[feature_name, "Overall"] extract the value of a targeted feature, feature_name , in each extracted data frame. Only one feature is extracted from each data frame in every step.
  • unlist() converts the data structure of the extracted features, from a list to a numeric vector.
  • setNames create names for the numeric vector to make easy to identify the features of which the means are being calculated.

The functions used in this way all belong to base R category. You don't need to install any external package to get them. Another option is to use combinations of base R functions with other functions from purrr package.

A {purrr} way

library(purrr)

importance_rf |> 
  map(pluck(1,1)) |> 
  map(function(dat) set_names(dat[features,], features)) |>
  as.data.frame() |> 
  rowMeans() |> 
  set_names(paste0("mean_", features))

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491

These steps are much shorter than the ones in base R above, but what is done in each step might be less obvious.

Note that map is similar with lapply and pluck(x,1,1) is equivalent with x[[1]][[1]] .

A brief explanation about the code:

  • map(pluck(1,1)) extract the data frames, similar work with lapply(function(list) list[[1]]) above.
  • map(function(dat) set_names(dat[features,], features)) extracts the list of features, similar with dat[feature_name, "Overall"] above.

There is a difference:

In base R way above, every feature is extracted from all data frames, and then the mean is calculated, and then another feature is extracted the same way.

In this purrr way, all the targeted features are extracted from each data frame in the list, and then the features are combined to become a new data frame by using as.data.frame so that each row represents a feature. Then, rowMeans is used to calculate the mean values of all values of the features.

Note that you can check the result of each step before |> pipe. For example, importance_rf will show all objects in each element. importance_rf |> map(pluck(1,1)) will show only the data frame objects.

Updates for including weighted means

Here is a simple example of how to calculate weighted means of each feature in your list. Suppose you have this list:

some.list <- list(L1 = c(a = 2, b = 4, c = 7), 
                  L2 = c(a = 5, b = 5, c = 2), 
                  L3 = c(a = 3, b = 3, c = 6))
some.list
$L1
a b c 
2 4 7 

$L2
a b c 
5 5 2 

$L3
a b c 
3 3 6 

And suppose you have the following weight values for L1, L2, and L3 in the list:

weight <- c(w.L1 = 0.5, w.L2=0.6, w.L3 = 0.9)
weight
w.L1 w.L2 w.L3 
 0.5  0.6  0.9 

To calculate the weighted means of a, for example, you need this calculation:

在此处输入图像描述

You can get this by multiplying each value of a in the list with the respected normalized weight. In this case, the normalized weight for w1 is w1/(w1+w2+w3) .

To do these steps in R:

norm.weight <- weight/sum(weight)
norm.weight
w.L1 w.L2 w.L3 
0.25 0.30 0.45 

# weighted means of a,b, and c
some.list |> map2(norm.weight, `*`) |> as.data.frame() |> rowSums()
   a    b    c 
3.35 3.85 5.05 

Applying these mock weight values to your importance_rf list and the features in the example, we get:

importance_rf |> 
    map(pluck(1,1)) |> 
    map(function(dat) set_names(dat[features,], features)) |>
    map2(norm.weight, `*`) |> 
    as.data.frame() |> 
    rowSums()
    
     ESR      CRP      CPK      WBC      LDH 
23.68084 17.36211 26.72970 25.59180 31.29827 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM