简体   繁体   中英

List to dataframe using names as values for column in R

I have 88 tab separated files that I need to import into R.

They are named "Study-1-12"

  • Study: name of study
  • 1: subject id
  • [1]2: experimental day (either 1 or 2)
  • 1[2]: trial (either 1 or 2)

The data in each one looks like

START: dd.mm.yyy hh:mm:ss

WAITING 3780    ms      REACTION    1230  ms

WAITING 9700    ms      REACTION    377 ms


WAITING 5538    ms      REACTION    310 ms

WAITING 4599    ms      REACTION    361 ms

WAITING 9579    ms      REACTION    338 ms
END: dd.mm.yyy hh:mm:ss

So far I imported all of them into a list and summarised each one, so the end results is a table with two columns "waiting" and "reaction" both with a single mean value.

# Load filepaths and names
filepath <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = TRUE) # Load full path
filenames <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = FALSE) # load names of files

# load all files into list with named col headers
ldf <- lapply(filepath, function(x) read_tsv(file = x, skip = 1,
              col_names = c("waiting", "valueW", "ms", "ws", "reaction", "valueR", "ms1")))

names(ldf) <- filenames # rename items in list

# select only relevant cols and do the math
ldf <- lapply(ldf, function(x) x %>% 
                select(waiting, valueW, reaction, valueR) %>%
                filter(waiting == "WAITING") %>%
                summarise(waiting = mean(valueW), reaction = mean(valueR))
              )

Now what I would like to do is create a data frame with columns based on the file name (as above: study-1-12):

  • id: the first 1
  • exp: 1 or 2
  • trial: 1 or 2
  • waiting: the value from each data frame in the list
  • reaction: the value from each data frame in the list

Any way of doing this in R?

library(purrr)
library(stringi)

fils <- list.files("~/Data/so", full.names=TRUE)

fils
## [1] "/Some/path/to/data/studyA-1-12"  "/Some/path/to/data/studyB-30-31"

map_df(fils, function(x) {

  stri_match_all_regex(x, "([[:alnum:]]+)-([[:digit:]]+)-([[:digit:]])([[:digit:]])")[[1]] %>%
    as.list() %>%
    .[2:5] %>%
    set_names(c("study_name", "subject_id", "experiment_day", "trial")) -> meta

  readLines(x) %>%
    grep("WAITING", ., value=TRUE) %>%
    map(~scan(text=., quiet=TRUE,
              what=list(character(), double(), character(),
                                character(), double(), character()))[c(2,5)]) %>%
    map_df(~set_names(as.list(.), c("waiting", "reaction"))) -> df

  df$study_name <- meta$study_name
  df$subject_id <- meta$subject_id
  df$experiment_day <- meta$experiment_day
  df$trial <- meta$trial

  df

})
## # A tibble: 10 × 6
##    waiting reaction study_name subject_id experiment_day trial
##      <dbl>    <dbl>      <chr>      <chr>          <chr> <chr>
## 1     3780     1230     studyA          1              1     2
## 2     9700      377     studyA          1              1     2
## 3     5538      310     studyA          1              1     2
## 4     4599      361     studyA          1              1     2
## 5     9579      338     studyA          1              1     2
## 6     3780     1230     studyB         30              3     1
## 7     9700      377     studyB         30              3     1
## 8     5538      310     studyB         30              3     1
## 9     4599      361     studyB         30              3     1
## 10    9579      338     studyB         30              3     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM