简体   繁体   中英

Difficulty combining lists, characters, and numbers into data frame

I'm lost on how to combine my data into a usable data frame. I have a list of lists of character and number vectors Here is a working example of my code so far:

remove(list=ls())

# Headers for each of my column names
headers <- c("name","p","c","prophylaxis","control","inclusion","exclusion","conversion excluded","infection criteria","age criteria","mean age","age sd")

#_name = author and year
#_p = no. in experimental arm.
#_c = no. in control arm
#_abx = antibiotic used
#_con = control used
#_inc = inclusion criteria
#_exc = exclusion criteria
#_coexc = was conversion to open excluded?
#_infxn = infection criteria
#_agecrit = age criteria
#_agemean = mean age of study
#_agesd = sd age of study

# Passos 2016
passos_name <- c("Passos","2016")
passos_p <- 50
passos_c <- 50
passos_abx <- "cefazolin 1g at induction"
passos_con <- "none"
passos_inc <- c("elective LC","symptomatic cholelithiasis","low risk")
passos_exc <- c("renal impairment","hepatic impairment","immunosuppression","regular steroid use","antibiotics within 48H","acute cholecystitis","choledocolithiasis")
passos_coexc <- TRUE
passos_infxn <- c("temperature >37.8C","tachycardia","asthenia","local pain","local purulence")
passos_agecrit <- NULL
passos_agemean <- 48
passos_agesd <- 13.63
passos <- list(passos_name,passos_p,passos_c,passos_abx,passos_con,passos_inc,passos_exc,passos_coexc,passos_infxn,passos_agecrit,passos_agemean,passos_agesd)
names(passos) <- headers

# Darzi 2016
darzi_name <- c("Darzi","2016")
darzi_p <- 182
darzi_c <- 247
darzi_abx <- c("cefazolin 1g 30min prior to induction","cefazolin 1g 6H after induction","cefazolin 1g 12H after induction")
darzi_con <- "NaCl"
darzi_inc <- c("elective LC","first time abdominal surgery") 
darzi_exc <- c("antibiotics within 7 days","immunosuppression","acute cholecystitis","choledocolithiasis","cholangitis","obstructive jaundice",
              "pancreatitis","previous biliary tract surgery","previous ERCP","DM","massive intraoperative bleeding","antibiotic allergy","major thalassemia",
              "empyema")
darzi_coexc <- TRUE
darzi_infxn <- c("temperature >38C","local purulence","intra-abdominal collection")
darzi_agecrit <- c(">18", "<75")
darzi_agemean <- 43.75
darzi_agesd <- 13.30
darzi <- list(darzi_name,darzi_p,darzi_c,darzi_abx,darzi_con,darzi_inc,darzi_exc,darzi_coexc,darzi_infxn,darzi_agecrit,darzi_agemean,darzi_agesd)
names(darzi) <- headers

# Matsui 2014
matsui_name <- c("Matsui","2014")
matsui_p <- 504
matsui_c <- 505
matsui_abx <- c("cefazolin 1g at induction","cefazolin 1g 12H after induction","cefazolin 1g 24H after induction")
matsui_con <- "none"
matsui_inc <- "elective LC"
matsui_exc <- c("emergent","concurrent surgery","regular insulin use","regular steroid use","antibiotic allergy","HD","antibiotics within 7 days","hepatic impairment","chemotherapy")
matsui_coexc <- FALSE
matsui_infxn <- c("local purulence","intra-abdominal collection","distant infection","temperature >38C")
matsui_agecrit <- ">18"
matsui_agemean <- NULL
matsui_agesd <- NULL
matsui <- list(matsui_name,matsui_p,matsui_c,matsui_abx,matsui_con,matsui_inc,matsui_exc,matsui_coexc,matsui_infxn,matsui_agecrit,matsui_agemean,matsui_agesd)
names(matsui) <- headers

# Find unique exclusion critieria in order to create the list of all possible levels
exc <- ls()[grepl("_exc",ls())]
exclist <- sapply(exc,get)
exc.levels <- unique(unlist(exclist,use.names = F))

# Find unique inclusion critieria in order to create the list of all possible levels
inc <- ls()[grepl("_inc",ls())]
inclist <- sapply(inc,get)
inc.levels <- unique(unlist(inclist,use.names = F))

# Find unique antibiotics order to create the list of all possible levels
abx <- ls()[grepl("_abx",ls())]
abxlist <- sapply(abx,get)
abx.levels <- unique(unlist(abxlist,use.names = F))

# Find unique controls in order to create the list of all possible levels
con <- ls()[grepl("_con",ls())]
conlist <- sapply(con,get)
con.levels <- unique(unlist(conlist,use.names = F))

# Find unique age critieria in order to create the list of all possible levels
agecrit <- ls()[grepl("_agecrit",ls())]
agecritlist <- sapply(agecrit,get)
agecrit.levels <- unique(unlist(agecritlist,use.names = F))

I have been struggling with:

1) Turn each of the _exc , _inc , _abx , _con , _agecrit lists into factors using the levels generated at the end of the code block. I have been trying to use a for loop such as:

for (x in exc) {
    as.name(x) <- factor(get(x),levels = exc.levels)
}

This only creates a variable, x , that stores the last parsed list as a factor .

2) Combine all of my data into a data frame formatted as such:

name, p, c, prophylaxis, control, inclusion, exclusion, conversion excluded, infection criteria, age criteria, mean age, age sd
"Passos 2016", 50, 50, "cefazolin 1g at induction", "none", ["elective LC","symptomatic cholelithiasis","low risk"], ["renal impairment","hepatic impairment","immunosuppression","regular steroid use","antibiotics within 48H","acute cholecystitis","choledocolithiasis"], TRUE, ["temperature >37.8C","tachycardia","asthenia","local pain","local purulence"], NULL, 48, 13.63
...
# [] = factors
# columns correspond to each studies variables (i.e. passos_name, passos_p, passos_c, etc..)
# rows correspond to each study (i.e., passos, darzi, matsui)

I have tried various solutions on StackOverflow, but have not found any that work; for example:

studies <- list(passos,darzi,matsui,ruangsin,turk,naqvi,hassan,sharma,uludag,yildiz,kuthe,koc,maha,tocchi,higgins,mahmoud,kumar)
library(data.table)
rbindlist(lapply(studies,as.data.frame.list))

I suspect my data may not be exactly amenable to a data frame? Primarily because of trying to store a list of factors in a column. Is that allowed? If not, how is this type of data normally stored? My goal is to be able to meaningfully compare these various criterion across studies.

This is too long for a comment, so I turn it into an "answer":

To start with, have a look at what happens here:

data.frame(name = "Passos, 2016", p = 50)
          name  p
1 Passos, 2016 50

data.frame(name = c("Passos", "2016"), p = 50)
    name  p
1 Passos 50
2   2016 50

In the first one, we created a dataframe with the column "name" which contained one entry "Passos, 2016", ie one character containing both pieces of information, and the column "p". All fine. Now, in the second version, I specified the column "name" as you did above, using c(Passos, 2016) . This is a two-element vector, and hence we get two rows in the dataframe: one with name Passos, one with name 2016, and the column p gets recycled.

Clearly, the latter is probably not what you intended. But it works anyway because R just recycles the shorter vector. Now, what do you think happens if I add a vector that contains three elements?

And this highlights the main issue with what you are doing: you are trying to get a dataframe from many vectors with different lengths. Now, in some cases this is fine if you want the shorter vector to be repeated (in R speech , we call this "recycled"), but it does not look like something you want to do here.

So, my recommendation would be this: try to imagine a matrix and make sure you understand what each element (row and column) is supposed to be. Then specify your data accordingly. If in doubt, look up "tidy data".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM