[英]How to insert placeholders for columns without data points using R?
I have performed several experiments that I have analysed using our software.我已经使用我们的软件进行了几次分析。 This software yields for each Experiment a separate folder containing a .txt file that is called "DistList" if the software was able to analyse the images.该软件为每个实验生成一个单独的文件夹,其中包含一个名为“DistList”的 .txt 文件,如果该软件能够分析图像。 If it was not able to do so, there is no .txt file.如果它不能这样做,则没有 .txt 文件。 In general, the folder arrangement looks like this, if there is a DistList:一般来说,文件夹排列是这样的,如果有一个DistList:
To take all those .txt files together, I have already made an R script:为了将所有这些 .txt 文件放在一起,我已经制作了一个 R 脚本:
setwd("~/Desktop/Results/.")
fileList <- list.files(path = ".", recursive = TRUE, pattern = "DistList.txt", full.names = TRUE)
listData <- lapply(fileList, read.table)
names(listData) <- basename(dirname(fileList))
library(tidyverse)
library(reshape2)
bind_rows(listData, .id = "FileName") %>%
group_by(FileName) %>%
mutate(rowNum = row_number()) %>%
dcast(rowNum~FileName, value.var = "V1") %>%
select(-rowNum) %>%
write.csv(file="Result.csv")
In this form, it now yields a document that has the following stucture, as there is no DistList.txt in A03 and A04:在这种形式中,它现在生成具有以下结构的文档,因为 A03 和 A04 中没有 DistList.txt:
A01 A02 A05
103 118 558
225 545 779
228 666 898
553 1002 1883
966 2004 NA
1112 3332 NA
NA 4556 NA
NA 5596 NA
NA 6639 NA
However, I would like to have a list, where the folders that contain no DistList.txt document, are listed also in the resulting .csv file, such as:但是,我想要一个列表,其中不包含 DistList.txt 文档的文件夹也列在生成的 .csv 文件中,例如:
A01 A02 A03 A04 A05
103 118 NA NA 558
225 545 NA NA 779
228 666 NA NA 898
553 1002 NA NA 1883
966 2004 NA NA NA
1112 3332 NA NA NA
NA 4556 NA NA NA
NA 5596 NA NA NA
NA 6639 NA NA NA
But I don't know, how I have to modify my script in a way that it yields a list like that.但我不知道,我必须如何修改我的脚本以生成这样的列表。 It would be no problem, if I just had very few experiments.如果我只有很少的实验,那就没问题了。 But in my case there are several hundreds of those columns and it would take too much time to verify manually, if there is anything missing.但就我而言,有数百个这样的列,如果缺少任何内容,手动验证将花费太多时间。
I would be very grateful, if you could help me with this problem!如果您能帮我解决这个问题,我将不胜感激!
The easiest thing to do is to modify the first two lines, ie the file listing and the loading:最简单的就是修改前两行,即文件列表和加载:
fileList = file.path(dir(path = ".", pattern = "A\\d+", full.names = TRUE), "DistList.txt")
This generates a list of files for all folders, even if the corresponding DistList.txt
file doesn't exist.这会生成所有文件夹的文件列表,即使相应的DistList.txt
文件不存在。 Next, we load them if they exist, otherwise we just return a tibble containing a single NA
(don't forget to load the ‹tibble› package before executing this function):接下来,如果它们存在,我们就加载它们,否则我们只返回一个包含单个NA
的 tibble(不要忘记在执行此函数之前加载 ‹tibble› 包):
load_if_exists = function (filename, ...) {
tryCatch(
suppressWarnings(read.table(filename, ...)),
error = function (x) tibble(NA)
)
}
listData = lapply(fileList, load_if_exists)
Note that load_if_exists
uses tryCatch
rather than relying on file.exists
.请注意, load_if_exists
使用tryCatch
而不是依赖file.exists
。 This is probably unimportant in your case but in general you cannot rely on file existence checks because of the file system is not synchronised, ie theoretically reading a file could fail even if the prior file existence check succeeded.这在您的情况下可能并不重要,但通常您不能依赖文件存在检查,因为文件系统未同步,即理论上即使先前的文件存在检查成功,读取文件也可能失败。 tryError
is therefore more robust in such a situation.因此,在这种情况下, tryError
更加健壮。
Unfortunately the file
function, called internally within read.table
, will stupidly create a warning in addition to an error for nonexistent files;不幸的是,在read.table
内部调用的file
函数除了对不存在的文件产生错误外,还会愚蠢地创建警告; we suppress this warning explicitly in the code above.我们在上面的代码中明确禁止了这个警告。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.