[英]R loop between multiple data.frames and assign values to them
I'm using R to perform some alterations in cnvkit output (for my purposes).我正在使用R在 cnvkit 输出中执行一些更改(出于我的目的)。 The thing is: doing the job sample by sample, the script works like a charm but, when I put it into a for loop, it breaks!问题是:逐个示例地完成工作,脚本就像一个魅力,但是,当我将它放入 for 循环时,它会中断!
Tried a lot of answers posted on Stack Overflow but none of then helped me.尝试了 Stack Overflow 上发布的很多答案,但没有一个对我有帮助。
# Clear workspace
rm(list=(ls()))
ref <- read.csv("/path/to/reference.cnn", header=T, sep="\t")
path <- "/path/to/call_files/"
files = list.files(path = path, pattern = "*.final.call.cnr", full.names=FALSE)
for(file in files) {
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read.csv(paste(path,file,sep=""), header=T, sep="\t"))
}
mod_CNV = function(x) {
# Merge both files by "start" position
merged <- merge(files[i], ref, by="start", suffixes=c(".files[i]", ".ref"))
# Round "log2" column
merged$log2.D00893 <- round(merged$log2.files[i], digits=1)
# re-calculate "cn" based on log2 correction
merged$cn <- round(2*(2^(merged$log2.files[i])))
# Subset file with all "cn" values that are not 2
alt.cn <- subset(merged, merged$cn !=2)
# Create new data with columns of interest
alt.cns <- as.data.frame(alt.cn[, c(1:8,13)])
# Re-order columns for better view
alt.cns <- alt.cns[c(2,1,3,4,6,5,8,7,9)]
# Calculate ratio between coverages
alt.cns$depth.ratio <- round(alt.cns$depth.files[i] / alt.cns$depth.ref, digits=2)
alt.cns$depth.ratio.1 <- round(alt.cns$depth.files[i] / alt.cns$depth.ref, digits=2)
## Function to call for DUP or DEL.
alt.cns$SV_type <- ifelse(alt.cns$cn < 2, "DEL", "DUP")
# Convert "alt.cns" to .bed file
full <- alt.cns[c(1,2,3,12,5,4,6,7,8,9,10)]
names(full)[1] <- "#Chrom"
names(full)[2] <- "Start"
names(full)[3] <- "End"
names(full)[4] <- "SV_type"
names(full)[6] <- "gene"
names(full)[7] <- "log2"
# Save "alt.cns" as .bed file
write.table(full, file="/path/to/output/files[i].bed", row.names=F, col.names=T, sep="\t")
# Filter "alt.cns" file
filtered <- subset(alt.cns, alt.cns$depth.ratio < 0.70 | alt.cns$depth.ratio > 1.40 & alt.cns$weight > 0.3)
filtered <- filtered[c(1,2,3,12,5,4,6,7,8,9,10)]
names(filtered)[1] <- "#Chrom"
names(filtered)[2] <- "Start"
names(filtered)[3] <- "End"
names(filtered)[4] <- "SV_type"
names(filtered)[6] <- "gene"
names(filtered)[7] <- "log2"
#Save file
write.table(filtered, file="/path/to/output/files[i].bed", row.names=F, col.names=T, sep="\t")
}
for ( i in seq_along(files)) {
mod_CNV(files[i])
}
What I expect is that the loop reads file by file and assign each individual file name to variables files[i]
and save as .pdf.我期望的是循环逐个文件读取文件并将每个单独的文件名分配给变量files[i]
并另存为 .pdf。 But, I'm getting a error right on the beginning of the code:但是,我在代码的开头遇到了一个错误:
"Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column". “fix.by(by.x, x) 中的错误:'by' 必须指定唯一有效的列”。
For some reason, the loop isn't recognizing my files[i]
variable, which is causing this error.出于某种原因,循环无法识别我的files[i]
变量,这导致了此错误。 Can someone help me in this problem?有人可以帮我解决这个问题吗? To be clear, this error doesn't occur when running sample by sample, out of the loop.需要明确的是,在循环外逐个样本运行时不会发生此错误。
Welcome to StackOverflow!欢迎使用 StackOverflow!
You've declared a function:你已经声明了一个函数:
mod_CNV = function(x) {
# Merge both files by "start" position
merged <- merge(files[i], ref, by="start", suffixes=c(".files[i]", ".ref"))
.
.
.
}
From what I can tell, there is no reason that this function should know what i
is;据我所知,这个函数没有理由知道i
是什么; this is probably why files[i]
fails.这可能是files[i]
失败的原因。
Here is where i
is located这是i
所在的地方
for ( i in seq_along(files)) {
mod_CNV(files[i])
}
i
is a variable that is local to the for
loop. i
是一个局部于for
循环的变量。 If you want it to be available inside mod_CNV, you'd need to pass it in as a parameter.如果您希望它在 mod_CNV 中可用,则需要将其作为参数传入。
What you are passing in to mod_CNV
is the filename.您传递给mod_CNV
是文件名。 Inside of mod_CNV
, this filename is referred to as x
yet I don't see anywhere inside mod_CNV
where you use x
.在mod_CNV
内部,这个文件名被称为x
但我在mod_CNV
内部没有看到你使用x
任何地方。
This is how you should declare your function and make use of the filename you are passing in:这是你应该如何声明你的函数并使用你传入的文件名:
mod_CNV = function(filename) {
# Merge both files by "start" position
merged <- merge(filename, ref, by="start", suffixes=c(filename, ".ref"))
.
.
.
# replace all other occurrences of `file[i]` with `filename`
}
And you can loop through the list of files and call mod_CNV
like this, without using i
:您可以遍历文件列表并像这样调用mod_CNV
,而无需使用i
:
for (file in files) {
mod_CNV(file)
}
Also, I haven't used merge
before and I don't know exactly what you are trying to do... but I find it odd to use an entire filename as a suffix.另外,我以前没有使用过merge
,我不知道你到底想做什么......但我发现使用整个文件名作为后缀很奇怪。 But it may be what you intended.但它可能是你想要的。
Anyway, this should be enough information for you to resolve your issue.无论如何,这应该足以让您解决问题。
For those who falls on the same problem as I, there goes the right code:对于那些与我遇到相同问题的人,有正确的代码:
path <- "/path/to/files/"
files = list.files(path = path, pattern = "*.file.ext", full.names=FALSE)
for(file in files) {
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read.csv(paste(path,file,sep=""), header=T, sep="\t"))
}
s_ref <- read.csv("/read/ref/file", header=T, sep="\t")
s_ref["depth.ref.norm"] <- round(s_ref["depth"]/mean(s_ref[["depth"]]), digits=2)
mod_CNV = function(file) {
file_df <- read.csv(file, header=T, sep="\t")
# Normalize $depth by mean
file_df[sprintf("depth.%s.norm", file)] <- round(file_df[["depth"]]/mean(file_df[["depth"]]), digits=2)
# Merge both files by "start" position
merged <- merge(file_df, s_ref, by="start", suffixes=c(sprintf(".%s", file), ".ref"), all=TRUE)
# Round "log2" column
log2_col_name = sprintf("log2.%s", file)
merged[log2_col_name] <- round(merged[[log2_col_name]], digits=1)
# re-calculate "cn" based on log2 correction
merged["cn"] <- round(2*(2^(merged[[log2_col_name]])))
# Subset file with all "cn" values that are not 2
alt_cn <- subset(merged, merged[["cn"]] != 2)
# Create new data with columns of interest
alt_cns <- as.data.frame(alt_cn[, c(1:9,14,18)])
# Re-order columns for better view
alt_cns <- alt_cns[c(2,1,3,4,6,5,8,7,9,10,11)]
# Calculate ratio between coverages
alt_cns["depth.ratio.norm"] <- round(alt_cns[[sprintf("depth.%s.norm", file)]] / alt_cns[["depth.ref.norm"]], digits=2)
alt_cns["depth.ratio"] <- round(alt_cns[[sprintf("depth.%s", file)]] / alt_cns[["depth.ref"]], digits=2)
## Function to call for DUP or DEL.
alt_cns["SV_type"] <- ifelse(alt_cns$cn < 2, "DEL", "AMP")
# Convert "alt.cns" to .bed file
full <- alt_cns[c(1,2,3,14,5,4,6,7,8,9,10,11,12,13)]
names(full)[1] <- "#Chrom"
names(full)[2] <- "Start"
names(full)[3] <- "End"
names(full)[4] <- "SV_type"
names(full)[6] <- "gene"
names(full)[7] <- "log2"
full["weight"] <- round(full[["weight"]], digits = 2)
full <- full[order(full$"#Chrom"),]
# Save "full" as .bed file
output_file = sprintf("/path/%s.bed", file)
write.table(full, file=output_file, row.names=F, col.names=T, sep="\t", dec=",")
}
print(files)
for (file in files) {
mod_CNV(file)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.