简体   繁体   中英

How to optimise an r function with 2 inputs within a loop

I am new to r and I am surprised at how long it takes to run what I believe to be rather simple lines of code, this leads me to believe I am missing something rather obvious. I have searched the internet and tried a few different iterations of the function but nothing has improved the efficiency (measured in time).

The Extract data is a data frame with 18.5m rows and 11 variables. I am trying to establish two things, first what percentage of patients stay in a hospital for longer than 7 as a percentage of all patients and second 21 days stays as a proportion of 7 days.

LOS_prob_providerage <- function(x,y){ Var1 = which(Extract$LOS>=0 & Extract$ProviderCode == x & Extract$age_group == y) Var2 = which(Extract$LOS>=7 & Extract$ProviderCode == x & Extract$age_group == y) return(list(Strand=(sum(Extract$LOS[Var1] >= 7)/length(Var1))*100, ELOS=(sum(Extract$LOS[Var2] >= 21)/length(Var2))*100)) }

When I call this function I give it a list of hospitals as the x variable and 1 age group from a list for the y variable (I can't seem to get it to take both as lists and output all hospitals for all age groups) using the following set of code

Providerage_prob_strand = mapply(LOS_prob_providerage,Provider_unique, agelabels[1], SIMPLIFY = FALSE)

I then create a data frame using the 2 lists that the function outputs using the code below

 National = data.frame(matrix(unlist(Providerage_prob_strand), ncol=2, 
 byrow=T),row.names = Provider_unique)
 colnames(National) <- c("Stranded_010","ELOS_010")

I subsequently re-run the last portions of code for all 11 elements in my age group list and append to the National data frame.

Question 1: Is there a less computationally intensive way to code my loop using r, or is the loop just taking that length of time due to the way r stores everything in memory?

Question 2: Is there anywhere to give r two lists for both the x and y varibale using mapply/sapply and for it to output the results to both Strand and ELOS across all hospitals /age groups?

I would use the data.table package for this.

Some dummy data to demonstrate (usually it is good practice for the question asker to provide this):

set.seed(123)
df1 = data.frame(
  provider = sample(LETTERS[1:4], 1000, T),
  los = round(runif(1000,0,40)),
  age_group = sample(1:4,1000, T))

Now we turn this into a data table

library(data.table)
setDT(df1)

and we can extact the values you want like this:

providerlist = c('A','B')
age_list = c(1,2)

df1[provider %in% providerlist & age_group %in% age_list,
  .(los_greater_than7 = 100*sum(los>7)/.N),
  keyby = .(provider, age_group)]
#    provider age_group los_greater_than7
# 1:        A         1          92.40506
# 2:        A         2          81.81818
# 3:        B         1          77.27273
# 4:        B         2          87.50000

df1[provider %in% providerlist & age_group %in% age_list & los>7,
  .(los_greater_than20 = 100*sum(los>20)/.N),
  by = .(provider, age_group)]
#    provider age_group los_greater_than20
# 1:        A         1           56.16438
# 2:        A         2           66.66667
# 3:        B         1           56.86275
# 4:        B         2           58.92857

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM