简体   繁体   中英

Analysing multiple datasets in R

I have the following code in R in order to analyse one dataset:

da_if <- function(x, TYPE) {
  data <-read.csv(x,sep = "", skip = 61, na.strings = "NA",header = F, col.names = paste0("V", seq_len(6)), fill = TRUE) #skips first 61 lines due to those lines all being settings and makes six columns
  data$V2 <- as.factor(data$V2)
  data$V5 <- as.numeric(data$V5)
  data$V6 <-as.numeric(data$V6) # 8&9 are needed in order so sum up values and get rid of two words!
  y <- split(data, data$V2)
  if (TYPE == "SCALES") {
    y <- y$SCALES
    y[is.na(y)]=0
    y$V5<-y$V5+y$V6
    y<-y[,-6]
    return(y)
  } else if (TYPE == "TRACK") {
    y <- y$TRACK
    return(y)
  } else if (TYPE == "RESMAN") {
    y<- y$RESMAN
    return(y)
  } else if (TYPE == "COMMUN"){
    y <- y$COMMUN
    return(y)
  } else {print("Insert valid datatype...")}
}

And I have a list of files to analyse, generated by this code:

fta<-list.files(pattern=".log", full.names = T) #files to analyse

Is there any way to change my function in order to take analyze 32 datasets at once? I have tried using apply, but I'm quite frankly not able to store the results in 32 different variables to further analyze the data and for some reason, I'm too stupid to write a for loop that actually does anything... I'd just like to analyze the SCALES part of my data for now.

Thank you all in advance!!

Edit : My raw data looks something like this and is part of the NASA Toolbox Multitasking Exercise:

14:29:00.467154     TRACK   STATE   CURSOR  X   0.012751340110832256
14:29:00.467154     TRACK   STATE   CURSOR  Y   -0.08704373265652304
14:29:00.487683     TRACK   STATE   CURSOR  X   0.012479403159392622
14:29:00.488668     TRACK   STATE   CURSOR  Y   -0.08733692625790845
14:29:00.491681     MAIN    STATE       PAUSE
14:29:00.515652     MAIN    STATE   GENERICSCALES   START
14:30:53.308644     SCALES  INPUT   Mentale Anforderung 7
14:30:53.309640     SCALES  INPUT   Körperliche Beanspruchung   6
14:30:53.310467     SCALES  INPUT   Zeitdruck   5
14:30:53.311462     SCALES  INPUT   Leistung    3
14:30:53.311462     SCALES  INPUT   Anstrengung 7
14:30:53.312459     SCALES  INPUT   Frustration 5
14:30:53.316458     MAIN    STATE       RESUME
14:30:53.319470     MAIN    STATE   PUMPSTATUS  STOP
14:30:53.320461     MAIN    STATE   RESMAN  STOP
14:30:53.321456     MAIN    STATE   SYSMON  STOP
14:30:53.322470     MAIN    STATE   COMMUNICATIONS  STOP

Don't store data in 32 different variables. Use lapply and store data in a list:

list_data <- lapply(fta, da_if, TYPE = 'SCALES')

If you want to combine the list of data into one you can then use

combine_data <- do.call(rbind, list_data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM