简体   繁体   中英

How to insert placeholders for columns without data points using R?

I have performed several experiments that I have analysed using our software. This software yields for each Experiment a separate folder containing a .txt file that is called "DistList" if the software was able to analyse the images. If it was not able to do so, there is no .txt file. In general, the folder arrangement looks like this, if there is a DistList:

在此处输入图片说明

To take all those .txt files together, I have already made an R script:

setwd("~/Desktop/Results/.")

fileList <- list.files(path = ".", recursive = TRUE, pattern = "DistList.txt", full.names = TRUE)

listData <- lapply(fileList, read.table)

names(listData) <- basename(dirname(fileList))

library(tidyverse)
library(reshape2)

bind_rows(listData, .id = "FileName") %>%
  group_by(FileName) %>%
  mutate(rowNum = row_number()) %>%
  dcast(rowNum~FileName, value.var = "V1") %>%
  select(-rowNum) %>%
  write.csv(file="Result.csv")

In this form, it now yields a document that has the following stucture, as there is no DistList.txt in A03 and A04:

A01    A02    A05
103    118    558
225    545    779
228    666    898
553    1002   1883
966    2004   NA
1112   3332   NA
NA     4556   NA
NA     5596   NA
NA     6639   NA

However, I would like to have a list, where the folders that contain no DistList.txt document, are listed also in the resulting .csv file, such as:

A01    A02    A03   A04   A05
103    118    NA    NA    558
225    545    NA    NA    779
228    666    NA    NA    898
553    1002   NA    NA    1883
966    2004   NA    NA    NA
1112   3332   NA    NA    NA
NA     4556   NA    NA    NA
NA     5596   NA    NA    NA
NA     6639   NA    NA    NA

But I don't know, how I have to modify my script in a way that it yields a list like that. It would be no problem, if I just had very few experiments. But in my case there are several hundreds of those columns and it would take too much time to verify manually, if there is anything missing.

I would be very grateful, if you could help me with this problem!

The easiest thing to do is to modify the first two lines, ie the file listing and the loading:

fileList = file.path(dir(path = ".", pattern = "A\\d+", full.names = TRUE), "DistList.txt")

This generates a list of files for all folders, even if the corresponding DistList.txt file doesn't exist. Next, we load them if they exist, otherwise we just return a tibble containing a single NA (don't forget to load the ‹tibble› package before executing this function):

load_if_exists = function (filename, ...) {
    tryCatch(
        suppressWarnings(read.table(filename, ...)),
        error = function (x) tibble(NA)
    )
}

listData = lapply(fileList, load_if_exists)

Note that load_if_exists uses tryCatch rather than relying on file.exists . This is probably unimportant in your case but in general you cannot rely on file existence checks because of the file system is not synchronised, ie theoretically reading a file could fail even if the prior file existence check succeeded. tryError is therefore more robust in such a situation.

Unfortunately the file function, called internally within read.table , will stupidly create a warning in addition to an error for nonexistent files; we suppress this warning explicitly in the code above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM